diff mbox

[RFC] powerpc/kexec: Wait 1s for secondaries to enter OPAL

Message ID 1437544469-20028-1-git-send-email-sam.mj@au1.ibm.com (mailing list archive)
State Accepted
Headers show

Commit Message

Sam Mendoza-Jonas July 22, 2015, 5:54 a.m. UTC
Always include a timeout when waiting for secondary cpus to enter OPAL
in the kexec path, rather than only when crashing.

Signed-off-by: Samuel Mendoza-Jonas <sam.mj@au1.ibm.com>
---
 arch/powerpc/platforms/powernv/setup.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

Comments

Stewart Smith July 27, 2015, 5:56 a.m. UTC | #1
Samuel Mendoza-Jonas <sam.mj@au1.ibm.com> writes:
> Always include a timeout when waiting for secondary cpus to enter OPAL
> in the kexec path, rather than only when crashing.

This *sounds* reasonable... but I wonder what actual worse case could
be and why we'd get stuck too long waiting for things?

What was the original bug/problem that inspired this patch?

and is 1s enough?
Sam Mendoza-Jonas July 28, 2015, 6:13 a.m. UTC | #2
On 27/07/15 15:56, Stewart Smith wrote:
> Samuel Mendoza-Jonas <sam.mj@au1.ibm.com> writes:
>> Always include a timeout when waiting for secondary cpus to enter OPAL
>> in the kexec path, rather than only when crashing.
> 
> This *sounds* reasonable... but I wonder what actual worse case could
> be and why we'd get stuck too long waiting for things?
> 
> What was the original bug/problem that inspired this patch?
> 
> and is 1s enough?

"It sounds reasonable" was more or less the inspiration :)
While I was going over some of the code relating to the previous kexec
fix with Ben he pointed this out and suggested there wasn't
much of a reason to differentiate between a crashing/non-crashing
cpu as far as the timeout goes - if we're not 'crashing' we still
don't want to spin forever.

I'll let Ben comment on whether 1s per cpu is enough.

> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
>
Benjamin Herrenschmidt July 28, 2015, 9:58 a.m. UTC | #3
On Tue, 2015-07-28 at 16:13 +1000, Samuel Mendoza-Jonas wrote:

> "It sounds reasonable" was more or less the inspiration :)
> While I was going over some of the code relating to the previous kexec
> fix with Ben he pointed this out and suggested there wasn't
> much of a reason to differentiate between a crashing/non-crashing
> cpu as far as the timeout goes - if we're not 'crashing' we still
> don't want to spin forever.
> 
> I'll let Ben comment on whether 1s per cpu is enough.

Well, if the scheduler doesn't give us the CPU at the point of kexec
within a second, I think we are in pretty bad shape already, don't you
think ?

I don't mind bumping the timeout of you have worries...

Cheers,
Ben.
Stewart Smith July 29, 2015, 7:24 a.m. UTC | #4
Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
> On Tue, 2015-07-28 at 16:13 +1000, Samuel Mendoza-Jonas wrote:
>
>> "It sounds reasonable" was more or less the inspiration :)
>> While I was going over some of the code relating to the previous kexec
>> fix with Ben he pointed this out and suggested there wasn't
>> much of a reason to differentiate between a crashing/non-crashing
>> cpu as far as the timeout goes - if we're not 'crashing' we still
>> don't want to spin forever.
>> 
>> I'll let Ben comment on whether 1s per cpu is enough.
>
> Well, if the scheduler doesn't give us the CPU at the point of kexec
> within a second, I think we are in pretty bad shape already, don't you
> think ?

Quite likely, I think my dislike of magic timeouts just kicked in :)
Michael Ellerman Oct. 12, 2015, 11:21 a.m. UTC | #5
On Wed, 2015-22-07 at 05:54:29 UTC, Samuel Mendoza-Jonas wrote:
> Always include a timeout when waiting for secondary cpus to enter OPAL
> in the kexec path, rather than only when crashing.
> 
> Signed-off-by: Samuel Mendoza-Jonas <sam.mj@au1.ibm.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/1b70386c99e997b359735c75

cheers
diff mbox

Patch

diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
index 59076db..f916601 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -195,7 +195,7 @@  static void pnv_kexec_wait_secondaries_down(void)
 
 	for_each_online_cpu(i) {
 		uint8_t status;
-		int64_t rc;
+		int64_t rc, timeout = 1000;
 
 		if (i == my_cpu)
 			continue;
@@ -212,6 +212,18 @@  static void pnv_kexec_wait_secondaries_down(void)
 				       i, paca[i].hw_cpu_id);
 				notified = i;
 			}
+
+			/*
+			 * On crash secondaries might be unreachable or hung,
+			 * so timeout if we've waited too long
+			 * */
+			mdelay(1);
+			if (timeout-- == 0) {
+				printk(KERN_ERR "kexec: timed out waiting for "
+				       "cpu %d (physical %d) to enter OPAL\n",
+				       i, paca[i].hw_cpu_id);
+				break;
+			}
 		}
 	}
 }
@@ -233,13 +245,6 @@  static void pnv_kexec_cpu_down(int crash_shutdown, int secondary)
 
 		/* Return the CPU to OPAL */
 		opal_return_cpu();
-	} else if (crash_shutdown) {
-		/*
-		 * On crash, we don't wait for secondaries to go
-		 * down as they might be unreachable or hung, so
-		 * instead we just wait a bit and move on.
-		 */
-		mdelay(1);
 	} else {
 		/* Primary waits for the secondaries to have reached OPAL */
 		pnv_kexec_wait_secondaries_down();