Message ID | 1444935658-27319-1-git-send-email-lvivier@redhat.com (mailing list archive) |
---|---|
State | Rejected, archived |
Headers | show |
Hi Laurent, [auto build test ERROR on powerpc/next -- if it's inappropriate base, please suggest rules for selecting the more suitable base] url: https://github.com/0day-ci/linux/commits/Laurent-Vivier/powerpc-on-crash-kexec-ed-kernel-needs-all-CPUs-are-online/20151016-030306 config: powerpc-wii_defconfig (attached as .config) reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=powerpc All errors (new ones prefixed by >>): arch/powerpc/kernel/crash.c: In function 'wake_offline_cpus': >> arch/powerpc/kernel/crash.c:315:4: error: implicit declaration of function 'cpu_up' [-Werror=implicit-function-declaration] cpu_up(cpu); ^ cc1: all warnings being treated as errors vim +/cpu_up +315 arch/powerpc/kernel/crash.c 309 { 310 int cpu = 0; 311 312 for_each_present_cpu(cpu) { 313 if (!cpu_online(cpu)) { 314 pr_info("kexec: Waking offline cpu %d.\n", cpu); > 315 cpu_up(cpu); 316 } 317 } 318 } --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
On Thu, 2015-10-15 at 21:00 +0200, Laurent Vivier wrote: > On kexec, all secondary offline CPUs are onlined before > starting the new kernel, this is not done in the case of kdump. > > If kdump is configured and a kernel crash occurs whereas > some secondaries CPUs are offline (SMT=off), > the new kernel is not able to start them and displays some > "Processor X is stuck.". Do we know why they are stuck? I really don't like this fix. The reason we're doing a kdump is because the first kernel has panicked, possibly with locks held or data structures corrupted. Calling cpu_up() then goes and tries to run a bunch of code in the crashed kernel, which increases the chance of us just wedging completely. cheers
On Thu, 15 Oct 2015 21:00:58 +0200 Laurent Vivier <lvivier@redhat.com> wrote: > On kexec, all secondary offline CPUs are onlined before > starting the new kernel, this is not done in the case of kdump. > > If kdump is configured and a kernel crash occurs whereas > some secondaries CPUs are offline (SMT=off), > the new kernel is not able to start them and displays some > "Processor X is stuck.". > > Starting with POWER8, subcore logic relies on all threads of > core being booted. So, on startup kernel tries to start all > threads, and asks OPAL (or RTAS) to start all CPUs (including > threads). If a CPU has been offlined by the previous kernel, > it has not been returned to OPAL, and thus OPAL cannot restart > it: this CPU has been lost... > > Signed-off-by: Laurent Vivier <lvivier@redhat.com> Nice analysis of the problem. But, I'm a bit uneasy about this approach to fixing it: Onlining potentially hundreds of CPU threads seems like a risky operation in a kernel that's already crashed. I don't have a terribly clear idea of what is the best way to address this. Here's a few ideas in the right general direction: * I'm already looking into a kdump userspace fixes to stop it attempting to bring up secondary CPUs * A working kernel option to say "only allow this many online cpus ever" which we could pass to the kdump kernel would be nice * Paulus had an idea about offline threads returning themselves directly to OPAL by kicking a flag at kdump/kexec time. BenH, Paulus, OPAL <-> kernel cpu transitions don't seem to work quite how I thought they would. IIUC there's a register we can use to directly control which threads on a core are active. Given that I would have thought cpu "ownership" OPAL vs. kernel would be on a per-core, rather than per-thread basis. Is there some way we can change the CPU onlining / offlining code so that if threads aren't in OPAL, we directly enable them, rather than just hoping they're in a nap loop somewhere?
On 16/10/2015 04:14, Michael Ellerman wrote: > On Thu, 2015-10-15 at 21:00 +0200, Laurent Vivier wrote: >> On kexec, all secondary offline CPUs are onlined before >> starting the new kernel, this is not done in the case of kdump. >> >> If kdump is configured and a kernel crash occurs whereas >> some secondaries CPUs are offline (SMT=off), >> the new kernel is not able to start them and displays some >> "Processor X is stuck.". > > Do we know why they are stuck? Yes, we know :) On the crash, as the CPUs are offline, kernel doesn't call opal_return_cpu(), so for OPAL all these CPU are always in the kernel. When the new kernel starts, it call s opal_query_cpu_status() to know which CPUs are available. As they were not returned to OPAL these CPUs are not available, but as the kernel logic relies on the fact they must be available (the logic is SMT is on), it is waiting for their starting and wait for ever... When the kernel starts, all secondary processors are started by a call for each of them of __cpu_up(): __cpu_up() ... cpu_callin_map[cpu] = 0; ... rc = smp_ops->kick_cpu(cpu); ...wait... if (!cpu_callin_map[cpu]) { printk(KERN_ERR "Processor %u is stuck.\n", cpu); ... on powernv, kick_cpu() is pnv_smp_kick_cpu(): pnv_smp_kick_cpu() ... unsigned long start_here = __pa(ppc_function_entry(generic_secondary_smp_init)); ... /* * Already started, just kick it, probably coming from * kexec and spinning */ rc = opal_query_cpu_status(pcpu, &status); ... if (status == OPAL_THREAD_STARTED) goto kick; ... rc = opal_start_cpu(pcpu, start_here); ... kick: ... generic_secondary_smp_init() is a function in assembly language that calls in the end start_secondary() : start_secondary() ... cpu_callin_map[cpu] = 1; ... So processors are stucked because start_secondary() is never called. start_secondary() is never called because OPAL cpu status is OPAL_THREAD_STARTED. Secondary CPUs are in "OPAL_THREAD_STARTED" state because they have not been returned to OPAL on crash. CPUs are returned to OPAL by pnv_kexec_cpu_down() which is called by crash_ipi_callback() (for secondary cpus)... except if the cpu is not online. As the CPUs are offline, they are not returned to OPAL, and then kernel can't restart them. > I really don't like this fix. The reason we're doing a kdump is because the > first kernel has panicked, possibly with locks held or data structures > corrupted. Calling cpu_up() then goes and tries to run a bunch of code in the > crashed kernel, which increases the chance of us just wedging completely. I agree, but the whole logic of the POWER kernel is we have all the threads available. Moreover the kernel parameter "maxcpus" is ignored if it is not a multiple of thread per core: ... static int subcore_init(void) { if (!cpu_has_feature(CPU_FTR_ARCH_207S)) return 0; /* * We need all threads in a core to be present to split/unsplit so * continue only if max_cpus are aligned to threads_per_core. */ if (setup_max_cpus % threads_per_core) return 0; ...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 16/10/2015 04:29, David Gibson wrote: > On Thu, 15 Oct 2015 21:00:58 +0200 Laurent Vivier > <lvivier@redhat.com> wrote: > >> On kexec, all secondary offline CPUs are onlined before starting >> the new kernel, this is not done in the case of kdump. >> >> If kdump is configured and a kernel crash occurs whereas some >> secondaries CPUs are offline (SMT=off), the new kernel is not >> able to start them and displays some "Processor X is stuck.". >> >> Starting with POWER8, subcore logic relies on all threads of core >> being booted. So, on startup kernel tries to start all threads, >> and asks OPAL (or RTAS) to start all CPUs (including threads). If >> a CPU has been offlined by the previous kernel, it has not been >> returned to OPAL, and thus OPAL cannot restart it: this CPU has >> been lost... >> >> Signed-off-by: Laurent Vivier <lvivier@redhat.com> > > Nice analysis of the problem. But, I'm a bit uneasy about this > approach to fixing it: Onlining potentially hundreds of CPU threads > seems like a risky operation in a kernel that's already crashed. I agree. > I don't have a terribly clear idea of what is the best way to > address this. Here's a few ideas in the right general direction: > > * I'm already looking into a kdump userspace fixes to stop it > attempting to bring up secondary CPUs > > * A working kernel option to say "only allow this many online cpus > ever" which we could pass to the kdump kernel would be nice > > * Paulus had an idea about offline threads returning themselves > directly to OPAL by kicking a flag at kdump/kexec time. For me the problem is: as these CPUs are offline, I guess the core has been switched to 1 thread per core, so the CPUs (1 to 7 for core 0) don't exist anymore, how can we return them to OPAL ? > > BenH, Paulus, > > OPAL <-> kernel cpu transitions don't seem to work quite how I > thought they would. IIUC there's a register we can use to directly > control which threads on a core are active. Given that I would > have thought cpu "ownership" OPAL vs. kernel would be on a > per-core, rather than per-thread basis. > > Is there some way we can change the CPU onlining / offlining code > so that if threads aren't in OPAL, we directly enable them, rather > than just hoping they're in a nap loop somewhere? > Laurent -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJWIK3kAAoJEPMMOL0/L748S4UP/2rJIRavrB4QylPMYKpRIxf6 VCLuve3TRY40er5GO8bwQ+95yHUo8K57OzZAh8T2mDQGjHGJArMElWUbb+EGaDF2 z5FU0iH7TKkJ9FDBlz2ZTny0vrEK2eBwxAFggLcfF8PeKMs5H4Rh9FrTFKKuc9Z4 KSAdhi4niKVdn0ln8M6k5FGB3AE0gG7zeTPeO74Knrr8cvOX1Xk5pfgzo2WpD91w zymDgG127xBL0G9gs8jrse+yXoB2dLsevdxS6CEH4vKnjsLokqnWlk1n9JeIUKiW +BEZ0llb5jppBYzOmrghTS5fPwh+Nmkbc4Kk9i/1Tjb8LRXNBEiSxVtHu9XIdwve K37gOIuqCkOap0NE/AbcDjsFEoCFVSHbdD6cCgtLEPVFq7f8w7U/qa9ty//PM8br KGtfZ1sG2/LCapMuyx3QhplxrXEy/bpQwT3BPnS818OMxrE20QfR5PM2C+nCpd4H 8mpdLpOctLJ7lgmYSwSlbNkJrQJvTFXv8WhZB2Qkadi0yaq8C5JZ3Dr10HrijoVL lsOfrevB/mHrZmLBkp8t4+UYa5fM59nNpFZ/0BTdWfP8CDAlkw2Kla5PVeKN4ssk GzySgQwOPsyS27aAk005ZeXPtfrGD93A43EcwG4IULf5J8DbzmCt5gPoJ241D0IO 3Z8+/4nl3WVRVzQ/Lwlc =yLqE -----END PGP SIGNATURE-----
On Fri, 2015-10-16 at 09:48 +0200, Laurent Vivier wrote: > > Yes, we know :) > > On the crash, as the CPUs are offline, kernel doesn't call > opal_return_cpu(), so for OPAL all these CPU are always in the > kernel. Hrm and they may even be in winkle state, so basically off... waking them up *could* be a tricky business. I suppose we could, near the last stage of kexec, patch the 0x100 vector to send anybody coming in to a kexec wait loop, and then machine gun the IPIs. But that will make them come in with an unclean ICP needing an EOI, I'm not sure we handle that very well. Ideally we could just soft-reset them but that's broken on P8. Cheers, Ben.
On Fri, 2015-10-16 at 09:57 +0200, Laurent Vivier wrote: > For me the problem is: as these CPUs are offline, I guess the core has > been switched to 1 thread per core, so the CPUs (1 to 7 for core 0) > don't exist anymore, how can we return them to OPAL ? Another option is to make the new kernel kick_cpu fallback, if it knows it's coming as a crashdump, to sending IPIs. We would need some sane way to catch the guys coming it at 0x100 and route the to secondary start. Cheers, Ben.
On 10/16/2015 12:30 AM, Laurent Vivier wrote: > On kexec, all secondary offline CPUs are onlined before > starting the new kernel, this is not done in the case of kdump. > > If kdump is configured and a kernel crash occurs whereas > some secondaries CPUs are offline (SMT=off), > the new kernel is not able to start them and displays some > "Processor X is stuck.". > > Starting with POWER8, subcore logic relies on all threads of > core being booted. So, on startup kernel tries to start all > threads, and asks OPAL (or RTAS) to start all CPUs (including > threads). If a CPU has been offlined by the previous kernel, > it has not been returned to OPAL, and thus OPAL cannot restart > it: this CPU has been lost... > > Signed-off-by: Laurent Vivier<lvivier@redhat.com> Hi Laurent, Sorry for jumping too late into this. Are you seeing this issue even with the below patches: pseries: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c1caae3de46a072d0855729aed6e793e536a4a55 opal/powernv: https://github.com/open-power/skiboot/commit/9ee56b5 Thanks Hari > --- > arch/powerpc/kernel/crash.c | 20 ++++++++++++++++++++ > 1 file changed, 20 insertions(+) > > diff --git a/arch/powerpc/kernel/crash.c b/arch/powerpc/kernel/crash.c > index 51dbace..3ca9452 100644 > --- a/arch/powerpc/kernel/crash.c > +++ b/arch/powerpc/kernel/crash.c > @@ -19,6 +19,7 @@ > #include <linux/delay.h> > #include <linux/irq.h> > #include <linux/types.h> > +#include <linux/cpu.h> > > #include <asm/processor.h> > #include <asm/machdep.h> > @@ -299,11 +300,30 @@ int crash_shutdown_unregister(crash_shutdown_t handler) > } > EXPORT_SYMBOL(crash_shutdown_unregister); > > +/* > + * The next kernel will try to start all secondary CPUs and if > + * there are not online it will fail to start them. > + * > + */ > +static void wake_offline_cpus(void) > +{ > + int cpu = 0; > + > + for_each_present_cpu(cpu) { > + if (!cpu_online(cpu)) { > + pr_info("kexec: Waking offline cpu %d.\n", cpu); > + cpu_up(cpu); > + } > + } > +} > + > void default_machine_crash_shutdown(struct pt_regs *regs) > { > unsigned int i; > int (*old_handler)(struct pt_regs *regs); > > + wake_offline_cpus(); > + > /* > * This function is only called after the system > * has panicked or is otherwise in a critical state.
On 04/11/2015 13:34, Hari Bathini wrote: > On 10/16/2015 12:30 AM, Laurent Vivier wrote: >> On kexec, all secondary offline CPUs are onlined before >> starting the new kernel, this is not done in the case of kdump. >> >> If kdump is configured and a kernel crash occurs whereas >> some secondaries CPUs are offline (SMT=off), >> the new kernel is not able to start them and displays some >> "Processor X is stuck.". >> >> Starting with POWER8, subcore logic relies on all threads of >> core being booted. So, on startup kernel tries to start all >> threads, and asks OPAL (or RTAS) to start all CPUs (including >> threads). If a CPU has been offlined by the previous kernel, >> it has not been returned to OPAL, and thus OPAL cannot restart >> it: this CPU has been lost... >> >> Signed-off-by: Laurent Vivier<lvivier@redhat.com> > > > Hi Laurent, Hi Hari, > Sorry for jumping too late into this. better late than never :) > Are you seeing this issue even with the below patches: > > pseries: > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c1caae3de46a072d0855729aed6e793e536a4a55 > > > opal/powernv: > https://github.com/open-power/skiboot/commit/9ee56b5 Very interesting. Is there a way to have a firmware with the fix ? Thanks, Laurent > Thanks > Hari > >> --- >> arch/powerpc/kernel/crash.c | 20 ++++++++++++++++++++ >> 1 file changed, 20 insertions(+) >> >> diff --git a/arch/powerpc/kernel/crash.c b/arch/powerpc/kernel/crash.c >> index 51dbace..3ca9452 100644 >> --- a/arch/powerpc/kernel/crash.c >> +++ b/arch/powerpc/kernel/crash.c >> @@ -19,6 +19,7 @@ >> #include <linux/delay.h> >> #include <linux/irq.h> >> #include <linux/types.h> >> +#include <linux/cpu.h> >> #include <asm/processor.h> >> #include <asm/machdep.h> >> @@ -299,11 +300,30 @@ int crash_shutdown_unregister(crash_shutdown_t >> handler) >> } >> EXPORT_SYMBOL(crash_shutdown_unregister); >> +/* >> + * The next kernel will try to start all secondary CPUs and if >> + * there are not online it will fail to start them. >> + * >> + */ >> +static void wake_offline_cpus(void) >> +{ >> + int cpu = 0; >> + >> + for_each_present_cpu(cpu) { >> + if (!cpu_online(cpu)) { >> + pr_info("kexec: Waking offline cpu %d.\n", cpu); >> + cpu_up(cpu); >> + } >> + } >> +} >> + >> void default_machine_crash_shutdown(struct pt_regs *regs) >> { >> unsigned int i; >> int (*old_handler)(struct pt_regs *regs); >> + wake_offline_cpus(); >> + >> /* >> * This function is only called after the system >> * has panicked or is otherwise in a critical state. >
On Wed, 4 Nov 2015 14:54:51 +0100 Laurent Vivier <lvivier@redhat.com> wrote: > > > On 04/11/2015 13:34, Hari Bathini wrote: > > On 10/16/2015 12:30 AM, Laurent Vivier wrote: > >> On kexec, all secondary offline CPUs are onlined before > >> starting the new kernel, this is not done in the case of kdump. > >> > >> If kdump is configured and a kernel crash occurs whereas > >> some secondaries CPUs are offline (SMT=off), > >> the new kernel is not able to start them and displays some > >> "Processor X is stuck.". > >> > >> Starting with POWER8, subcore logic relies on all threads of > >> core being booted. So, on startup kernel tries to start all > >> threads, and asks OPAL (or RTAS) to start all CPUs (including > >> threads). If a CPU has been offlined by the previous kernel, > >> it has not been returned to OPAL, and thus OPAL cannot restart > >> it: this CPU has been lost... > >> > >> Signed-off-by: Laurent Vivier<lvivier@redhat.com> > > > > > > Hi Laurent, > > Hi Hari, > > > Sorry for jumping too late into this. > > better late than never :) > > > Are you seeing this issue even with the below patches: > > > > pseries: > > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c1caae3de46a072d0855729aed6e793e536a4a55 Unfortunately, this is unlikely to be relevant - this fixes a failure while setting up the kexec. The problem we see occurs once we've booted the second kernel and it's attempting to bring up secondary CPUs. > > opal/powernv: > > https://github.com/open-power/skiboot/commit/9ee56b5 > > Very interesting. Is there a way to have a firmware with the fix ? From Laurent's analysis of the crash, I don't think this will be relevant either, but I'm not sure. It would be very interesting to know which (if any) released firmwares include this patch so we can test it.
David Gibson <dgibson@redhat.com> writes: >> > opal/powernv: >> > https://github.com/open-power/skiboot/commit/9ee56b5 >> >> Very interesting. Is there a way to have a firmware with the fix ? > > From Laurent's analysis of the crash, I don't think this will be > relevant either, but I'm not sure. It would be very interesting to > know which (if any) released firmwares include this patch so we can > test it. It'll be on the (just released) IBM LC machines (the ones with the AMI BMC) and will be in the next major firmware version for FSP based machines (the -L machines) FW840, which should be out in the next month. Let me know if you want a build of that, we should be able to get one to you. For any OpenPower machine you can always build a custom skiboot and flash it :)
On 11/05/2015 07:02 AM, David Gibson wrote: > On Wed, 4 Nov 2015 14:54:51 +0100 > Laurent Vivier <lvivier@redhat.com> wrote: > >> >> On 04/11/2015 13:34, Hari Bathini wrote: >>> On 10/16/2015 12:30 AM, Laurent Vivier wrote: >>>> On kexec, all secondary offline CPUs are onlined before >>>> starting the new kernel, this is not done in the case of kdump. >>>> >>>> If kdump is configured and a kernel crash occurs whereas >>>> some secondaries CPUs are offline (SMT=off), >>>> the new kernel is not able to start them and displays some >>>> "Processor X is stuck.". >>>> >>>> Starting with POWER8, subcore logic relies on all threads of >>>> core being booted. So, on startup kernel tries to start all >>>> threads, and asks OPAL (or RTAS) to start all CPUs (including >>>> threads). If a CPU has been offlined by the previous kernel, >>>> it has not been returned to OPAL, and thus OPAL cannot restart >>>> it: this CPU has been lost... >>>> >>>> Signed-off-by: Laurent Vivier<lvivier@redhat.com> >>> >>> Hi Laurent, >> Hi Hari, >> >>> Sorry for jumping too late into this. >> better late than never :) >> >>> Are you seeing this issue even with the below patches: >>> >>> pseries: >>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c1caae3de46a072d0855729aed6e793e536a4a55 > Unfortunately, this is unlikely to be relevant - this fixes a failure > while setting up the kexec. The problem we see occurs once we've > booted the second kernel and it's attempting to bring up secondary CPUs. > >>> opal/powernv: >>> https://github.com/open-power/skiboot/commit/9ee56b5 >> Very interesting. Is there a way to have a firmware with the fix ? > From Laurent's analysis of the crash, I don't think this will be > relevant either, but I'm not sure. It would be very interesting to > know which (if any) released firmwares include this patch so we can > test it. Hi Laurent/David, I am not so sure on this. While I get back on this, can you confirm you are seeing the issue in both PowerVM (pseries) and baremetal (powernv). What is the kernel version where the issue is seen for PowerVM and/or baremetal. Also, for baremetal, can you mention the OPAL version on which the issue is reproducible. If a bug is raised for this, I would be happy to be pointed to, to get more information on this. Thanks Hari > > > _______________________________________________ > Linuxppc-dev mailing list > Linuxppc-dev@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/linuxppc-dev
diff --git a/arch/powerpc/kernel/crash.c b/arch/powerpc/kernel/crash.c index 51dbace..3ca9452 100644 --- a/arch/powerpc/kernel/crash.c +++ b/arch/powerpc/kernel/crash.c @@ -19,6 +19,7 @@ #include <linux/delay.h> #include <linux/irq.h> #include <linux/types.h> +#include <linux/cpu.h> #include <asm/processor.h> #include <asm/machdep.h> @@ -299,11 +300,30 @@ int crash_shutdown_unregister(crash_shutdown_t handler) } EXPORT_SYMBOL(crash_shutdown_unregister); +/* + * The next kernel will try to start all secondary CPUs and if + * there are not online it will fail to start them. + * + */ +static void wake_offline_cpus(void) +{ + int cpu = 0; + + for_each_present_cpu(cpu) { + if (!cpu_online(cpu)) { + pr_info("kexec: Waking offline cpu %d.\n", cpu); + cpu_up(cpu); + } + } +} + void default_machine_crash_shutdown(struct pt_regs *regs) { unsigned int i; int (*old_handler)(struct pt_regs *regs); + wake_offline_cpus(); + /* * This function is only called after the system * has panicked or is otherwise in a critical state.
On kexec, all secondary offline CPUs are onlined before starting the new kernel, this is not done in the case of kdump. If kdump is configured and a kernel crash occurs whereas some secondaries CPUs are offline (SMT=off), the new kernel is not able to start them and displays some "Processor X is stuck.". Starting with POWER8, subcore logic relies on all threads of core being booted. So, on startup kernel tries to start all threads, and asks OPAL (or RTAS) to start all CPUs (including threads). If a CPU has been offlined by the previous kernel, it has not been returned to OPAL, and thus OPAL cannot restart it: this CPU has been lost... Signed-off-by: Laurent Vivier <lvivier@redhat.com> --- arch/powerpc/kernel/crash.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)