mbox series

[SRU,J,0/7] kdump fails on arm64 when offset is not specified

Message ID 20230710125612.37735-1-ioanna-maria.alifieraki@canonical.com
Headers show
Series kdump fails on arm64 when offset is not specified | expand

Message

Ioanna Alifieraki July 10, 2023, 12:56 p.m. UTC
BugLink: https://bugs.launchpad.net/bugs/2024479

[Impact]

kdump fails on arm64, on machines with a lot of memory when offeset is not specified,
e.g when /etc/default/grub.d/kdump-tools.cfg looks like:
GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=4G"

If kdump-tools.cfg specifies the offset e.g.:
GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=4G@4G"
it works ok.

The reason for this is that the kernel needs to allocate memory for the crashkernel both
in low and high memory. This is addressed in kernel 6.2.
In addition kexec-tools needs to support more than one crash kernel regions.

To address this issue the following upstrem commits are needed:

From the kernel side :
commit a9ae89df737756d92f0e14873339cf393f7f7eb0
Author: Zhen Lei <thunder.leizhen@huawei.com>
Date: Wed Nov 16 20:10:44 2022 +0800

arm64: kdump: Support crashkernel=X fall back to reserve region above DMA zones

commit a149cf00b158e1793a8dd89ca492379c366300d2
Author: Zhen Lei <thunder.leizhen@huawei.com>
Date: Wed Nov 16 20:10:43 2022 +0800

arm64: kdump: Provide default size when crashkernel=Y,low is not specified

[Test]

You need a machine (can be a VM too) with large memory e.g. 128G.
Install linux-crashdump and trigger a crash.
It won't work unless the offset is specified because the memory crashkernel cannot be allocated.

With the patches applied it works as expected without having to specify the offset.

[Regression Potential]

KERNEL 5.15:
To address this problem in the 5.15 kernel we need to pull in 7 commits (see [Other] section for details).
The commits cherrypick or need minor context adjustments.
All the commits are changing code only for arm64 architecture and only the code related to reserving the crashkernel.
This means that any regression potential will affect only the arm64 architecture and in particular the crash/kexec functionality.
However, since the reservation of the crashkernel occurs at boot up, potentially things could go wrong there as well.

[Other]

Commits to backport
*** JAMMY:

kernel (5.15 kernel):
a9ae89df737756d92f0e14873339cf393f7f7eb0 arm64: kdump: Support crashkernel=X fall back to reserve region above DMA zones
a149cf00b158e1793a8dd89ca492379c366300d2 arm64: kdump: Provide default size when crashkernel=Y,low is not specified
4890cc18f94979b406f95708f8cb238eb2d0e5a9 arm64/mm: Define defer_reserve_crashkernel()
8f0f104e2ab6eed4cad3b111dc206f843bda43ea arm64: kdump: Do not allocate crash low memory if not needed
5832f1ae50600ac6b2b6d00cfef42d33a9473f06 docs: kdump: Update the crashkernel description for arm64
944a45abfabc171fd121315ff0d5e62b11cb5d6f arm64: kdump: Reimplement crashkernel=X
d339f1584f0acf32b32326627fa3b48e6e65c599 arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef

Anshuman Khandual (1):
  arm64/mm: Define defer_reserve_crashkernel()

Chen Zhou (1):
  arm64: kdump: Reimplement crashkernel=X

Jisheng Zhang (1):
  arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef

Zhen Lei (4):
  docs: kdump: Update the crashkernel description for arm64
  arm64: kdump: Do not allocate crash low memory if not needed
  arm64: kdump: Provide default size when crashkernel=Y,low is not
    specified
  arm64: kdump: Support crashkernel=X fall back to reserve region above
    DMA zones

 .../admin-guide/kernel-parameters.txt         | 15 +--
 arch/arm64/include/asm/memory.h               |  5 +
 arch/arm64/kernel/machine_kexec.c             |  9 +-
 arch/arm64/kernel/machine_kexec_file.c        | 12 ++-
 arch/arm64/mm/init.c                          | 98 ++++++++++++++++---
 arch/arm64/mm/mmu.c                           |  6 +-
 6 files changed, 116 insertions(+), 29 deletions(-)

Comments

Tim Gardner July 10, 2023, 3:24 p.m. UTC | #1
On 7/10/23 6:56 AM, Ioanna Alifieraki wrote:
> BugLink: https://bugs.launchpad.net/bugs/2024479
> 
> [Impact]
> 
> kdump fails on arm64, on machines with a lot of memory when offeset is not specified,
> e.g when /etc/default/grub.d/kdump-tools.cfg looks like:
> GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=4G"
> 
> If kdump-tools.cfg specifies the offset e.g.:
> GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=4G@4G"
> it works ok.
> 
> The reason for this is that the kernel needs to allocate memory for the crashkernel both
> in low and high memory. This is addressed in kernel 6.2.
> In addition kexec-tools needs to support more than one crash kernel regions.
> 
> To address this issue the following upstrem commits are needed:
> 
>  From the kernel side :
> commit a9ae89df737756d92f0e14873339cf393f7f7eb0
> Author: Zhen Lei <thunder.leizhen@huawei.com>
> Date: Wed Nov 16 20:10:44 2022 +0800
> 
> arm64: kdump: Support crashkernel=X fall back to reserve region above DMA zones
> 
> commit a149cf00b158e1793a8dd89ca492379c366300d2
> Author: Zhen Lei <thunder.leizhen@huawei.com>
> Date: Wed Nov 16 20:10:43 2022 +0800
> 
> arm64: kdump: Provide default size when crashkernel=Y,low is not specified
> 
> [Test]
> 
> You need a machine (can be a VM too) with large memory e.g. 128G.
> Install linux-crashdump and trigger a crash.
> It won't work unless the offset is specified because the memory crashkernel cannot be allocated.
> 
> With the patches applied it works as expected without having to specify the offset.
> 
> [Regression Potential]
> 
> KERNEL 5.15:
> To address this problem in the 5.15 kernel we need to pull in 7 commits (see [Other] section for details).
> The commits cherrypick or need minor context adjustments.
> All the commits are changing code only for arm64 architecture and only the code related to reserving the crashkernel.
> This means that any regression potential will affect only the arm64 architecture and in particular the crash/kexec functionality.
> However, since the reservation of the crashkernel occurs at boot up, potentially things could go wrong there as well.
> 
> [Other]
> 
> Commits to backport
> *** JAMMY:
> 
> kernel (5.15 kernel):
> a9ae89df737756d92f0e14873339cf393f7f7eb0 arm64: kdump: Support crashkernel=X fall back to reserve region above DMA zones
> a149cf00b158e1793a8dd89ca492379c366300d2 arm64: kdump: Provide default size when crashkernel=Y,low is not specified
> 4890cc18f94979b406f95708f8cb238eb2d0e5a9 arm64/mm: Define defer_reserve_crashkernel()
> 8f0f104e2ab6eed4cad3b111dc206f843bda43ea arm64: kdump: Do not allocate crash low memory if not needed
> 5832f1ae50600ac6b2b6d00cfef42d33a9473f06 docs: kdump: Update the crashkernel description for arm64
> 944a45abfabc171fd121315ff0d5e62b11cb5d6f arm64: kdump: Reimplement crashkernel=X
> d339f1584f0acf32b32326627fa3b48e6e65c599 arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
> 
> Anshuman Khandual (1):
>    arm64/mm: Define defer_reserve_crashkernel()
> 
> Chen Zhou (1):
>    arm64: kdump: Reimplement crashkernel=X
> 
> Jisheng Zhang (1):
>    arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
> 
> Zhen Lei (4):
>    docs: kdump: Update the crashkernel description for arm64
>    arm64: kdump: Do not allocate crash low memory if not needed
>    arm64: kdump: Provide default size when crashkernel=Y,low is not
>      specified
>    arm64: kdump: Support crashkernel=X fall back to reserve region above
>      DMA zones
> 
>   .../admin-guide/kernel-parameters.txt         | 15 +--
>   arch/arm64/include/asm/memory.h               |  5 +
>   arch/arm64/kernel/machine_kexec.c             |  9 +-
>   arch/arm64/kernel/machine_kexec_file.c        | 12 ++-
>   arch/arm64/mm/init.c                          | 98 ++++++++++++++++---
>   arch/arm64/mm/mmu.c                           |  6 +-
>   6 files changed, 116 insertions(+), 29 deletions(-)
> 
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Roxana Nicolescu July 14, 2023, 9:48 a.m. UTC | #2
On 10/07/2023 14:56, Ioanna Alifieraki wrote:
> BugLink: https://bugs.launchpad.net/bugs/2024479
>
> [Impact]
>
> kdump fails on arm64, on machines with a lot of memory when offeset is not specified,
> e.g when /etc/default/grub.d/kdump-tools.cfg looks like:
> GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=4G"
>
> If kdump-tools.cfg specifies the offset e.g.:
> GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=4G@4G"
> it works ok.
>
> The reason for this is that the kernel needs to allocate memory for the crashkernel both
> in low and high memory. This is addressed in kernel 6.2.
> In addition kexec-tools needs to support more than one crash kernel regions.
>
> To address this issue the following upstrem commits are needed:
>
>  From the kernel side :
> commit a9ae89df737756d92f0e14873339cf393f7f7eb0
> Author: Zhen Lei <thunder.leizhen@huawei.com>
> Date: Wed Nov 16 20:10:44 2022 +0800
>
> arm64: kdump: Support crashkernel=X fall back to reserve region above DMA zones
>
> commit a149cf00b158e1793a8dd89ca492379c366300d2
> Author: Zhen Lei <thunder.leizhen@huawei.com>
> Date: Wed Nov 16 20:10:43 2022 +0800
>
> arm64: kdump: Provide default size when crashkernel=Y,low is not specified
>
> [Test]
>
> You need a machine (can be a VM too) with large memory e.g. 128G.
> Install linux-crashdump and trigger a crash.
> It won't work unless the offset is specified because the memory crashkernel cannot be allocated.
>
> With the patches applied it works as expected without having to specify the offset.
>
> [Regression Potential]
>
> KERNEL 5.15:
> To address this problem in the 5.15 kernel we need to pull in 7 commits (see [Other] section for details).
> The commits cherrypick or need minor context adjustments.
> All the commits are changing code only for arm64 architecture and only the code related to reserving the crashkernel.
> This means that any regression potential will affect only the arm64 architecture and in particular the crash/kexec functionality.
> However, since the reservation of the crashkernel occurs at boot up, potentially things could go wrong there as well.
>
> [Other]
>
> Commits to backport
> *** JAMMY:
>
> kernel (5.15 kernel):
> a9ae89df737756d92f0e14873339cf393f7f7eb0 arm64: kdump: Support crashkernel=X fall back to reserve region above DMA zones
> a149cf00b158e1793a8dd89ca492379c366300d2 arm64: kdump: Provide default size when crashkernel=Y,low is not specified
> 4890cc18f94979b406f95708f8cb238eb2d0e5a9 arm64/mm: Define defer_reserve_crashkernel()
> 8f0f104e2ab6eed4cad3b111dc206f843bda43ea arm64: kdump: Do not allocate crash low memory if not needed
> 5832f1ae50600ac6b2b6d00cfef42d33a9473f06 docs: kdump: Update the crashkernel description for arm64
> 944a45abfabc171fd121315ff0d5e62b11cb5d6f arm64: kdump: Reimplement crashkernel=X
> d339f1584f0acf32b32326627fa3b48e6e65c599 arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
>
> Anshuman Khandual (1):
>    arm64/mm: Define defer_reserve_crashkernel()
>
> Chen Zhou (1):
>    arm64: kdump: Reimplement crashkernel=X
>
> Jisheng Zhang (1):
>    arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
>
> Zhen Lei (4):
>    docs: kdump: Update the crashkernel description for arm64
>    arm64: kdump: Do not allocate crash low memory if not needed
>    arm64: kdump: Provide default size when crashkernel=Y,low is not
>      specified
>    arm64: kdump: Support crashkernel=X fall back to reserve region above
>      DMA zones
>
>   .../admin-guide/kernel-parameters.txt         | 15 +--
>   arch/arm64/include/asm/memory.h               |  5 +
>   arch/arm64/kernel/machine_kexec.c             |  9 +-
>   arch/arm64/kernel/machine_kexec_file.c        | 12 ++-
>   arch/arm64/mm/init.c                          | 98 ++++++++++++++++---
>   arch/arm64/mm/mmu.c                           |  6 +-
>   6 files changed, 116 insertions(+), 29 deletions(-)
>
LGTM.

Acked-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
Roxana Nicolescu Aug. 2, 2023, 6:36 a.m. UTC | #3
On 10/07/2023 14:56, Ioanna Alifieraki wrote:
> BugLink: https://bugs.launchpad.net/bugs/2024479
>
> [Impact]
>
> kdump fails on arm64, on machines with a lot of memory when offeset is not specified,
> e.g when /etc/default/grub.d/kdump-tools.cfg looks like:
> GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=4G"
>
> If kdump-tools.cfg specifies the offset e.g.:
> GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=4G@4G"
> it works ok.
>
> The reason for this is that the kernel needs to allocate memory for the crashkernel both
> in low and high memory. This is addressed in kernel 6.2.
> In addition kexec-tools needs to support more than one crash kernel regions.
>
> To address this issue the following upstrem commits are needed:
>
>  From the kernel side :
> commit a9ae89df737756d92f0e14873339cf393f7f7eb0
> Author: Zhen Lei <thunder.leizhen@huawei.com>
> Date: Wed Nov 16 20:10:44 2022 +0800
>
> arm64: kdump: Support crashkernel=X fall back to reserve region above DMA zones
>
> commit a149cf00b158e1793a8dd89ca492379c366300d2
> Author: Zhen Lei <thunder.leizhen@huawei.com>
> Date: Wed Nov 16 20:10:43 2022 +0800
>
> arm64: kdump: Provide default size when crashkernel=Y,low is not specified
>
> [Test]
>
> You need a machine (can be a VM too) with large memory e.g. 128G.
> Install linux-crashdump and trigger a crash.
> It won't work unless the offset is specified because the memory crashkernel cannot be allocated.
>
> With the patches applied it works as expected without having to specify the offset.
>
> [Regression Potential]
>
> KERNEL 5.15:
> To address this problem in the 5.15 kernel we need to pull in 7 commits (see [Other] section for details).
> The commits cherrypick or need minor context adjustments.
> All the commits are changing code only for arm64 architecture and only the code related to reserving the crashkernel.
> This means that any regression potential will affect only the arm64 architecture and in particular the crash/kexec functionality.
> However, since the reservation of the crashkernel occurs at boot up, potentially things could go wrong there as well.
>
> [Other]
>
> Commits to backport
> *** JAMMY:
>
> kernel (5.15 kernel):
> a9ae89df737756d92f0e14873339cf393f7f7eb0 arm64: kdump: Support crashkernel=X fall back to reserve region above DMA zones
> a149cf00b158e1793a8dd89ca492379c366300d2 arm64: kdump: Provide default size when crashkernel=Y,low is not specified
> 4890cc18f94979b406f95708f8cb238eb2d0e5a9 arm64/mm: Define defer_reserve_crashkernel()
> 8f0f104e2ab6eed4cad3b111dc206f843bda43ea arm64: kdump: Do not allocate crash low memory if not needed
> 5832f1ae50600ac6b2b6d00cfef42d33a9473f06 docs: kdump: Update the crashkernel description for arm64
> 944a45abfabc171fd121315ff0d5e62b11cb5d6f arm64: kdump: Reimplement crashkernel=X
> d339f1584f0acf32b32326627fa3b48e6e65c599 arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
>
> Anshuman Khandual (1):
>    arm64/mm: Define defer_reserve_crashkernel()
>
> Chen Zhou (1):
>    arm64: kdump: Reimplement crashkernel=X
>
> Jisheng Zhang (1):
>    arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
>
> Zhen Lei (4):
>    docs: kdump: Update the crashkernel description for arm64
>    arm64: kdump: Do not allocate crash low memory if not needed
>    arm64: kdump: Provide default size when crashkernel=Y,low is not
>      specified
>    arm64: kdump: Support crashkernel=X fall back to reserve region above
>      DMA zones
>
>   .../admin-guide/kernel-parameters.txt         | 15 +--
>   arch/arm64/include/asm/memory.h               |  5 +
>   arch/arm64/kernel/machine_kexec.c             |  9 +-
>   arch/arm64/kernel/machine_kexec_file.c        | 12 ++-
>   arch/arm64/mm/init.c                          | 98 ++++++++++++++++---
>   arch/arm64/mm/mmu.c                           |  6 +-
>   6 files changed, 116 insertions(+), 29 deletions(-)
>
Applied to jammy:master-next. Thanks!

Roxana