diff mbox series

[RESEND,v2] kexec/crash: no crash update when kexec in progress

Message ID 20240911112111.108056-1-sourabhjain@linux.ibm.com (mailing list archive)
State New
Headers show
Series [RESEND,v2] kexec/crash: no crash update when kexec in progress | expand

Commit Message

Sourabh Jain Sept. 11, 2024, 11:21 a.m. UTC
The following errors are observed when kexec is done with SMT=off on
powerpc.

[  358.458385] Removing IBM Power 842 compression device
[  374.795734] kexec_core: Starting new kernel
[  374.795748] kexec: Waking offline cpu 1.
[  374.875695] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
[  374.935833] kexec: Waking offline cpu 2.
[  375.015664] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
snip..
[  375.515823] kexec: Waking offline cpu 6.
[  375.635667] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
[  375.695836] kexec: Waking offline cpu 7.

To avoid kexec kernel boot failure on PowerPC, all the present CPUs that
are offline are brought online during kexec. For more information, refer
to commit e8e5c2155b00 ("powerpc/kexec: Fix orphaned offline CPUs across
kexec"). Bringing the CPUs online triggers the crash hotplug handler,
crash_handle_hotplug_event(), to update the kdump image. Since the
system is on the kexec kernel boot path and the kexec lock is held, the
crash_handle_hotplug_event() function fails to acquire the same lock to
update the kdump image, resulting in the error messages mentioned above.

To fix this, return from crash_handle_hotplug_event() without printing
the error message if kexec is in progress.

The same applies to the crash_check_hotplug_support() function. Return
0 if kexec is in progress because kernel is not in a position to update
the kdump image.

Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: kexec@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-kernel@vger.kernel.org
Cc: x86@kernel.org
Reported-by: Sachin P Bappalige <sachinpb@linux.vnet.ibm.com>
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
---
Changelog:

Since v1:
 - Keep the kexec_in_progress check within kexec_trylock() - Baoquan He
 - Include the reason why PowerPC brings offline CPUs online
   during the kexec kernel boot path - Baoquan He
 - Rebased on top of #next-20240910 to avoid conflict with the patch below
   https://lore.kernel.org/all/20240812041651.703156-1-sourabhjain@linux.ibm.com/T/#u

V2 RESEND:
 - Update linuxppc-dev mailing list address
---
 kernel/crash_core.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Baoquan He Sept. 11, 2024, 2:20 p.m. UTC | #1
On 09/11/24 at 04:51pm, Sourabh Jain wrote:
> The following errors are observed when kexec is done with SMT=off on
> powerpc.
> 
> [  358.458385] Removing IBM Power 842 compression device
> [  374.795734] kexec_core: Starting new kernel
> [  374.795748] kexec: Waking offline cpu 1.
> [  374.875695] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
> [  374.935833] kexec: Waking offline cpu 2.
> [  375.015664] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
> snip..
> [  375.515823] kexec: Waking offline cpu 6.
> [  375.635667] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
> [  375.695836] kexec: Waking offline cpu 7.
> 
> To avoid kexec kernel boot failure on PowerPC, all the present CPUs that
> are offline are brought online during kexec. For more information, refer
> to commit e8e5c2155b00 ("powerpc/kexec: Fix orphaned offline CPUs across
> kexec"). Bringing the CPUs online triggers the crash hotplug handler,
> crash_handle_hotplug_event(), to update the kdump image. Since the
> system is on the kexec kernel boot path and the kexec lock is held, the
> crash_handle_hotplug_event() function fails to acquire the same lock to
> update the kdump image, resulting in the error messages mentioned above.
> 
> To fix this, return from crash_handle_hotplug_event() without printing
> the error message if kexec is in progress.
> 
> The same applies to the crash_check_hotplug_support() function. Return
> 0 if kexec is in progress because kernel is not in a position to update
> the kdump image.

LGTM, thanks.

Acked-by: Baoquan he <bhe@redhat.com>

> 
> Cc: Hari Bathini <hbathini@linux.ibm.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: kexec@lists.infradead.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-kernel@vger.kernel.org
> Cc: x86@kernel.org
> Reported-by: Sachin P Bappalige <sachinpb@linux.vnet.ibm.com>
> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
> ---
> Changelog:
> 
> Since v1:
>  - Keep the kexec_in_progress check within kexec_trylock() - Baoquan He
>  - Include the reason why PowerPC brings offline CPUs online
>    during the kexec kernel boot path - Baoquan He
>  - Rebased on top of #next-20240910 to avoid conflict with the patch below
>    https://lore.kernel.org/all/20240812041651.703156-1-sourabhjain@linux.ibm.com/T/#u
> 
> V2 RESEND:
>  - Update linuxppc-dev mailing list address
> ---
>  kernel/crash_core.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index c1048893f4b6..078fe5bc5a74 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -505,7 +505,8 @@ int crash_check_hotplug_support(void)
>  	crash_hotplug_lock();
>  	/* Obtain lock while reading crash information */
>  	if (!kexec_trylock()) {
> -		pr_info("kexec_trylock() failed, kdump image may be inaccurate\n");
> +		if (!kexec_in_progress)
> +			pr_info("kexec_trylock() failed, kdump image may be inaccurate\n");
>  		crash_hotplug_unlock();
>  		return 0;
>  	}
> @@ -547,7 +548,8 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu,
>  	crash_hotplug_lock();
>  	/* Obtain lock while changing crash information */
>  	if (!kexec_trylock()) {
> -		pr_info("kexec_trylock() failed, kdump image may be inaccurate\n");
> +		if (!kexec_in_progress)
> +			pr_info("kexec_trylock() failed, kdump image may be inaccurate\n");
>  		crash_hotplug_unlock();
>  		return;
>  	}
> -- 
> 2.46.0
>
Sourabh Jain Sept. 12, 2024, 8:03 a.m. UTC | #2
Hello Baoquan,

On 11/09/24 19:50, Baoquan He wrote:
> On 09/11/24 at 04:51pm, Sourabh Jain wrote:
>> The following errors are observed when kexec is done with SMT=off on
>> powerpc.
>>
>> [  358.458385] Removing IBM Power 842 compression device
>> [  374.795734] kexec_core: Starting new kernel
>> [  374.795748] kexec: Waking offline cpu 1.
>> [  374.875695] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
>> [  374.935833] kexec: Waking offline cpu 2.
>> [  375.015664] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
>> snip..
>> [  375.515823] kexec: Waking offline cpu 6.
>> [  375.635667] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
>> [  375.695836] kexec: Waking offline cpu 7.
>>
>> To avoid kexec kernel boot failure on PowerPC, all the present CPUs that
>> are offline are brought online during kexec. For more information, refer
>> to commit e8e5c2155b00 ("powerpc/kexec: Fix orphaned offline CPUs across
>> kexec"). Bringing the CPUs online triggers the crash hotplug handler,
>> crash_handle_hotplug_event(), to update the kdump image. Since the
>> system is on the kexec kernel boot path and the kexec lock is held, the
>> crash_handle_hotplug_event() function fails to acquire the same lock to
>> update the kdump image, resulting in the error messages mentioned above.
>>
>> To fix this, return from crash_handle_hotplug_event() without printing
>> the error message if kexec is in progress.
>>
>> The same applies to the crash_check_hotplug_support() function. Return
>> 0 if kexec is in progress because kernel is not in a position to update
>> the kdump image.
> LGTM, thanks.
>
> Acked-by: Baoquan he <bhe@redhat.com>

Thank you for the Ack!

My understanding is that this patch will go upstream via the linux-next 
tree, as it is based on 
https://lore.kernel.org/all/20240902034708.88EC1C4CEC2@smtp.kernel.org/ 
which is already part of the linux-next master branch. - Sourabh Jain
Baoquan He Sept. 12, 2024, 8:35 a.m. UTC | #3
On 09/12/24 at 01:33pm, Sourabh Jain wrote:
> Hello Baoquan,
> 
> On 11/09/24 19:50, Baoquan He wrote:
> > On 09/11/24 at 04:51pm, Sourabh Jain wrote:
> > > The following errors are observed when kexec is done with SMT=off on
> > > powerpc.
> > > 
> > > [  358.458385] Removing IBM Power 842 compression device
> > > [  374.795734] kexec_core: Starting new kernel
> > > [  374.795748] kexec: Waking offline cpu 1.
> > > [  374.875695] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
> > > [  374.935833] kexec: Waking offline cpu 2.
> > > [  375.015664] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
> > > snip..
> > > [  375.515823] kexec: Waking offline cpu 6.
> > > [  375.635667] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
> > > [  375.695836] kexec: Waking offline cpu 7.
> > > 
> > > To avoid kexec kernel boot failure on PowerPC, all the present CPUs that
> > > are offline are brought online during kexec. For more information, refer
> > > to commit e8e5c2155b00 ("powerpc/kexec: Fix orphaned offline CPUs across
> > > kexec"). Bringing the CPUs online triggers the crash hotplug handler,
> > > crash_handle_hotplug_event(), to update the kdump image. Since the
> > > system is on the kexec kernel boot path and the kexec lock is held, the
> > > crash_handle_hotplug_event() function fails to acquire the same lock to
> > > update the kdump image, resulting in the error messages mentioned above.
> > > 
> > > To fix this, return from crash_handle_hotplug_event() without printing
> > > the error message if kexec is in progress.
> > > 
> > > The same applies to the crash_check_hotplug_support() function. Return
> > > 0 if kexec is in progress because kernel is not in a position to update
> > > the kdump image.
> > LGTM, thanks.
> > 
> > Acked-by: Baoquan he <bhe@redhat.com>
> 
> Thank you for the Ack!
> 
> My understanding is that this patch will go upstream via the linux-next
> tree, as it is based on
> https://lore.kernel.org/all/20240902034708.88EC1C4CEC2@smtp.kernel.org/
> which is already part of the linux-next master branch. - Sourabh Jain

Then you should mark it as [PATCH linux-next] in subject.

Since this patch is in generic code, it needs Andrew's help to
pick it. Let's wait and see if Andrew need a new post to change
the subject.

Thanks
Baoquan
diff mbox series

Patch

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index c1048893f4b6..078fe5bc5a74 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -505,7 +505,8 @@  int crash_check_hotplug_support(void)
 	crash_hotplug_lock();
 	/* Obtain lock while reading crash information */
 	if (!kexec_trylock()) {
-		pr_info("kexec_trylock() failed, kdump image may be inaccurate\n");
+		if (!kexec_in_progress)
+			pr_info("kexec_trylock() failed, kdump image may be inaccurate\n");
 		crash_hotplug_unlock();
 		return 0;
 	}
@@ -547,7 +548,8 @@  static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu,
 	crash_hotplug_lock();
 	/* Obtain lock while changing crash information */
 	if (!kexec_trylock()) {
-		pr_info("kexec_trylock() failed, kdump image may be inaccurate\n");
+		if (!kexec_in_progress)
+			pr_info("kexec_trylock() failed, kdump image may be inaccurate\n");
 		crash_hotplug_unlock();
 		return;
 	}