Message ID | 1437544469-20028-1-git-send-email-sam.mj@au1.ibm.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Samuel Mendoza-Jonas <sam.mj@au1.ibm.com> writes: > Always include a timeout when waiting for secondary cpus to enter OPAL > in the kexec path, rather than only when crashing. This *sounds* reasonable... but I wonder what actual worse case could be and why we'd get stuck too long waiting for things? What was the original bug/problem that inspired this patch? and is 1s enough?
On 27/07/15 15:56, Stewart Smith wrote: > Samuel Mendoza-Jonas <sam.mj@au1.ibm.com> writes: >> Always include a timeout when waiting for secondary cpus to enter OPAL >> in the kexec path, rather than only when crashing. > > This *sounds* reasonable... but I wonder what actual worse case could > be and why we'd get stuck too long waiting for things? > > What was the original bug/problem that inspired this patch? > > and is 1s enough? "It sounds reasonable" was more or less the inspiration :) While I was going over some of the code relating to the previous kexec fix with Ben he pointed this out and suggested there wasn't much of a reason to differentiate between a crashing/non-crashing cpu as far as the timeout goes - if we're not 'crashing' we still don't want to spin forever. I'll let Ben comment on whether 1s per cpu is enough. > > _______________________________________________ > Linuxppc-dev mailing list > Linuxppc-dev@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/linuxppc-dev >
On Tue, 2015-07-28 at 16:13 +1000, Samuel Mendoza-Jonas wrote: > "It sounds reasonable" was more or less the inspiration :) > While I was going over some of the code relating to the previous kexec > fix with Ben he pointed this out and suggested there wasn't > much of a reason to differentiate between a crashing/non-crashing > cpu as far as the timeout goes - if we're not 'crashing' we still > don't want to spin forever. > > I'll let Ben comment on whether 1s per cpu is enough. Well, if the scheduler doesn't give us the CPU at the point of kexec within a second, I think we are in pretty bad shape already, don't you think ? I don't mind bumping the timeout of you have worries... Cheers, Ben.
Benjamin Herrenschmidt <benh@kernel.crashing.org> writes: > On Tue, 2015-07-28 at 16:13 +1000, Samuel Mendoza-Jonas wrote: > >> "It sounds reasonable" was more or less the inspiration :) >> While I was going over some of the code relating to the previous kexec >> fix with Ben he pointed this out and suggested there wasn't >> much of a reason to differentiate between a crashing/non-crashing >> cpu as far as the timeout goes - if we're not 'crashing' we still >> don't want to spin forever. >> >> I'll let Ben comment on whether 1s per cpu is enough. > > Well, if the scheduler doesn't give us the CPU at the point of kexec > within a second, I think we are in pretty bad shape already, don't you > think ? Quite likely, I think my dislike of magic timeouts just kicked in :)
On Wed, 2015-22-07 at 05:54:29 UTC, Samuel Mendoza-Jonas wrote: > Always include a timeout when waiting for secondary cpus to enter OPAL > in the kexec path, rather than only when crashing. > > Signed-off-by: Samuel Mendoza-Jonas <sam.mj@au1.ibm.com> Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/1b70386c99e997b359735c75 cheers
diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c index 59076db..f916601 100644 --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -195,7 +195,7 @@ static void pnv_kexec_wait_secondaries_down(void) for_each_online_cpu(i) { uint8_t status; - int64_t rc; + int64_t rc, timeout = 1000; if (i == my_cpu) continue; @@ -212,6 +212,18 @@ static void pnv_kexec_wait_secondaries_down(void) i, paca[i].hw_cpu_id); notified = i; } + + /* + * On crash secondaries might be unreachable or hung, + * so timeout if we've waited too long + * */ + mdelay(1); + if (timeout-- == 0) { + printk(KERN_ERR "kexec: timed out waiting for " + "cpu %d (physical %d) to enter OPAL\n", + i, paca[i].hw_cpu_id); + break; + } } } } @@ -233,13 +245,6 @@ static void pnv_kexec_cpu_down(int crash_shutdown, int secondary) /* Return the CPU to OPAL */ opal_return_cpu(); - } else if (crash_shutdown) { - /* - * On crash, we don't wait for secondaries to go - * down as they might be unreachable or hung, so - * instead we just wait a bit and move on. - */ - mdelay(1); } else { /* Primary waits for the secondaries to have reached OPAL */ pnv_kexec_wait_secondaries_down();
Always include a timeout when waiting for secondary cpus to enter OPAL in the kexec path, rather than only when crashing. Signed-off-by: Samuel Mendoza-Jonas <sam.mj@au1.ibm.com> --- arch/powerpc/platforms/powernv/setup.c | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-)