mbox series

[v17,00/10] support reserving crashkernel above 4G on arm64 kdump

Message ID 20211210065533.2023-1-thunder.leizhen@huawei.com
Headers show
Series support reserving crashkernel above 4G on arm64 kdump | expand

Message

Leizhen (ThunderTown) Dec. 10, 2021, 6:55 a.m. UTC
There are following issues in arm64 kdump:
1. We use crashkernel=X to reserve crashkernel below 4G, which
will fail when there is no enough low memory.
2. If reserving crashkernel above 4G, in this case, crash dump
kernel will boot failure because there is no low memory available
for allocation.

To solve these issues, change the behavior of crashkernel=X.
crashkernel=X tries low allocation in DMA zone and fall back to high
allocation if it fails.

We can also use "crashkernel=X,high" to select a high region above
DMA zone, which also tries to allocate at least 256M low memory in
DMA zone automatically and "crashkernel=Y,low" can be used to allocate
specified size low memory.

When reserving crashkernel in high memory, some low memory is reserved
for crash dump kernel devices. So there may be two regions reserved for
crash dump kernel.
In order to distinct from the high region and make no effect to the use
of existing kexec-tools, rename the low region as "Crash kernel (low)",
and pass the low region by reusing DT property
"linux,usable-memory-range". We made the low memory region as the last
range of "linux,usable-memory-range" to keep compatibility with existing
user-space and older kdump kernels.

Besides, we need to modify kexec-tools:
arm64: support more than one crash kernel regions(see [1])

Another update is document about DT property 'linux,usable-memory-range':
schemas: update 'linux,usable-memory-range' node schema(see [2])

This patchset contains the following 10 patches:

0001-0004 are some x86 cleanups which prepares for making functionsreserve_crashkernel[_low]() generic.
0005 makes functions reserve_crashkernel[_low]() generic.
0006-0007 reimplements arm64 crashkernel=X.
0008-0009 adds memory for devices by DT property linux,usable-memory-range.
0010 updates the doc.

Changes since [v16]
- Because no functional changes in this version, so add
  "Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>" for patch 1-9
- Add "Reviewed-by: Rob Herring <robh@kernel.org>" for patch 8
- Update patch 9 based on the review comments of Rob Herring
- As Catalin Marinas's suggestion, merge the implementation of
  ARCH_WANT_RESERVE_CRASH_KERNEL into patch 5. Ensure that the
  contents of X86 and ARM64 do not overlap, and reduce unnecessary
  temporary differences.

Changes since [v15]
-  Aggregate the processing of "linux,usable-memory-range" into one function.
   Only patch 9-10 have been updated.

Changes since [v14]
- Recovering the requirement that the CrashKernel memory regions on X86
  only requires 1 MiB alignment.
- Combine patches 5 and 6 in v14 into one. The compilation warning fixed
  by patch 6 was introduced by patch 5 in v14.
- As with crashk_res, crashk_low_res is also processed by
  crash_exclude_mem_range() in patch 7.
- Due to commit b261dba2fdb2 ("arm64: kdump: Remove custom linux,usable-memory-range handling")
  has removed the architecture-specific code, extend the property "linux,usable-memory-range"
  in the platform-agnostic FDT core code. See patch 9.
- Discard the x86 description update in the document, because the description
  has been updated by commit b1f4c363666c ("Documentation: kdump: update kdump guide").
- Change "arm64" to "ARM64" in Doc.


Changes since [v13]
- Rebased on top of 5.11-rc5.
- Introduce config CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL.
Since reserve_crashkernel[_low]() implementations are quite similar on
other architectures, so have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in
arch/Kconfig and select this by X86 and ARM64.
- Some minor cleanup.

Changes since [v12]
- Rebased on top of 5.10-rc1.
- Keep CRASH_ALIGN as 16M suggested by Dave.
- Drop patch "kdump: add threshold for the required memory".
- Add Tested-by from John.

Changes since [v11]
- Rebased on top of 5.9-rc4.
- Make the function reserve_crashkernel() of x86 generic.
Suggested by Catalin, make the function reserve_crashkernel() of x86 generic
and arm64 use the generic version to reimplement crashkernel=X.

Changes since [v10]
- Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.

Changes since [v9]
- Patch 1 add Acked-by from Dave.
- Update patch 5 according to Dave's comments.
- Update chosen schema.

Changes since [v8]
- Reuse DT property "linux,usable-memory-range".
Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low
memory region.
- Fix kdump broken with ZONE_DMA reintroduced.
- Update chosen schema.

Changes since [v7]
- Move x86 CRASH_ALIGN to 2M
Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
- Update Documentation/devicetree/bindings/chosen.txt.
Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt
suggested by Arnd.
- Add Tested-by from Jhon and pk.

Changes since [v6]
- Fix build errors reported by kbuild test robot.

Changes since [v5]
- Move reserve_crashkernel_low() into kernel/crash_core.c.
- Delete crashkernel=X,high.
- Modify crashkernel=X,low.
If crashkernel=X,low is specified simultaneously, reserve spcified size low
memory for crash kdump kernel devices firstly and then reserve memory above 4G.
In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
pass to crash dump kernel by DT property "linux,low-memory-range".
- Update Documentation/admin-guide/kdump/kdump.rst.

Changes since [v4]
- Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.

Changes since [v3]
- Add memblock_cap_memory_ranges back for multiple ranges.
- Fix some compiling warnings.

Changes since [v2]
- Split patch "arm64: kdump: support reserving crashkernel above 4G" as
two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
patch.

Changes since [v1]:
- Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
- Remove memblock_cap_memory_ranges() i added in v1 and implement that
in fdt_enforce_memory_region().
There are at most two crash kernel regions, for two crash kernel regions
case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
and then remove the memory range in the middle.

[1]: http://lists.infradead.org/pipermail/kexec/2020-June/020737.html
[2]: https://github.com/robherring/dt-schema/pull/19 
[v1]: https://lkml.org/lkml/2019/4/2/1174
[v2]: https://lkml.org/lkml/2019/4/9/86
[v3]: https://lkml.org/lkml/2019/4/9/306
[v4]: https://lkml.org/lkml/2019/4/15/273
[v5]: https://lkml.org/lkml/2019/5/6/1360
[v6]: https://lkml.org/lkml/2019/8/30/142
[v7]: https://lkml.org/lkml/2019/12/23/411
[v8]: https://lkml.org/lkml/2020/5/21/213
[v9]: https://lkml.org/lkml/2020/6/28/73
[v10]: https://lkml.org/lkml/2020/7/2/1443
[v11]: https://lkml.org/lkml/2020/8/1/150
[v12]: https://lkml.org/lkml/2020/9/7/1037
[v13]: https://lkml.org/lkml/2020/10/31/34
[v14]: https://lkml.org/lkml/2021/1/30/53
[v15]: https://lkml.org/lkml/2021/10/19/1405
[v16]: https://lkml.org/lkml/2021/11/23/435


Chen Zhou (9):
  x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN
  x86: kdump: make the lower bound of crash kernel reservation
    consistent
  x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions
    reserve_crashkernel()
  x86: kdump: move xen_pv_domain() check and insert_resource() to
    setup_arch()
  x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
  arm64: kdump: introduce some macros for crash kernel reservation
  arm64: kdump: reimplement crashkernel=X
  of: fdt: Add memory for devices by DT property
    "linux,usable-memory-range"
  kdump: update Documentation about crashkernel

Zhen Lei (1):
  of: fdt: Aggregate the processing of "linux,usable-memory-range"

 Documentation/admin-guide/kdump/kdump.rst     |  11 +-
 .../admin-guide/kernel-parameters.txt         |  11 +-
 arch/Kconfig                                  |   3 +
 arch/arm64/Kconfig                            |   1 +
 arch/arm64/include/asm/kexec.h                |  10 ++
 arch/arm64/kernel/machine_kexec_file.c        |  12 +-
 arch/arm64/kernel/setup.c                     |  13 +-
 arch/arm64/mm/init.c                          |  59 ++-----
 arch/x86/Kconfig                              |   2 +
 arch/x86/include/asm/elf.h                    |   3 +
 arch/x86/include/asm/kexec.h                  |  31 +++-
 arch/x86/kernel/setup.c                       | 163 ++----------------
 drivers/of/fdt.c                              |  42 +++--
 include/linux/crash_core.h                    |   3 +
 include/linux/kexec.h                         |   2 -
 kernel/crash_core.c                           | 156 +++++++++++++++++
 kernel/kexec_core.c                           |  17 --
 17 files changed, 301 insertions(+), 238 deletions(-)

Comments

Kefeng Wang Dec. 10, 2021, 7:15 a.m. UTC | #1
On 2021/12/10 14:55, Zhen Lei wrote:
> There are following issues in arm64 kdump:
> 1. We use crashkernel=X to reserve crashkernel below 4G, which
> will fail when there is no enough low memory.
> 2. If reserving crashkernel above 4G, in this case, crash dump
> kernel will boot failure because there is no low memory available
> for allocation.
>
> To solve these issues, change the behavior of crashkernel=X.
> crashkernel=X tries low allocation in DMA zone and fall back to high
> allocation if it fails.
>
> We can also use "crashkernel=X,high" to select a high region above
> DMA zone, which also tries to allocate at least 256M low memory in
> DMA zone automatically and "crashkernel=Y,low" can be used to allocate
> specified size low memory.
>
> When reserving crashkernel in high memory, some low memory is reserved
> for crash dump kernel devices. So there may be two regions reserved for
> crash dump kernel.
> In order to distinct from the high region and make no effect to the use
> of existing kexec-tools, rename the low region as "Crash kernel (low)",
> and pass the low region by reusing DT property
> "linux,usable-memory-range". We made the low memory region as the last
> range of "linux,usable-memory-range" to keep compatibility with existing
> user-space and older kdump kernels.
>
> Besides, we need to modify kexec-tools:
> arm64: support more than one crash kernel regions(see [1])
>
> Another update is document about DT property 'linux,usable-memory-range':
> schemas: update 'linux,usable-memory-range' node schema(see [2])
>
> This patchset contains the following 10 patches:
>
> 0001-0004 are some x86 cleanups which prepares for making functionsreserve_crashkernel[_low]() generic.
> 0005 makes functions reserve_crashkernel[_low]() generic.
> 0006-0007 reimplements arm64 crashkernel=X.
> 0008-0009 adds memory for devices by DT property linux,usable-memory-range.
> 0010 updates the doc.
>
> Changes since [v16]
> - Because no functional changes in this version, so add
>    "Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>" for patch 1-9
> - Add "Reviewed-by: Rob Herring <robh@kernel.org>" for patch 8
> - Update patch 9 based on the review comments of Rob Herring
> - As Catalin Marinas's suggestion, merge the implementation of
>    ARCH_WANT_RESERVE_CRASH_KERNEL into patch 5. Ensure that the
>    contents of X86 and ARM64 do not overlap, and reduce unnecessary
>    temporary differences.

An Internal review has been done, so for this series,

Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Baoquan He Dec. 13, 2021, 1:17 p.m. UTC | #2
On 12/10/21 at 02:55pm, Zhen Lei wrote:
> From: Chen Zhou <chenzhou10@huawei.com>
> 
> Move CRASH_ALIGN to header asm/kexec.h for later use.
> 
> Suggested-by: Dave Young <dyoung@redhat.com>
> Suggested-by: Baoquan He <bhe@redhat.com>

I remember Dave and I discussed and suggested this when reviewing.
You can remove my Suggested-by.

For this one, I would like to add ack:

Acked-by: Baoquan He <bhe@redhat.com>

> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> Tested-by: John Donnelly <John.p.donnelly@oracle.com>
> Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>
> ---
>  arch/x86/include/asm/kexec.h | 3 +++
>  arch/x86/kernel/setup.c      | 3 ---
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> index 11b7c06e2828c30..3a22e65262aa70b 100644
> --- a/arch/x86/include/asm/kexec.h
> +++ b/arch/x86/include/asm/kexec.h
> @@ -18,6 +18,9 @@
>  
>  # define KEXEC_CONTROL_CODE_MAX_SIZE	2048
>  
> +/* 16M alignment for crash kernel regions */
> +#define CRASH_ALIGN		SZ_16M
> +
>  #ifndef __ASSEMBLY__
>  
>  #include <linux/string.h>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 6a190c7f4d71b05..5cc60996eac56d6 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -392,9 +392,6 @@ static void __init memblock_x86_reserve_range_setup_data(void)
>  
>  #ifdef CONFIG_KEXEC_CORE
>  
> -/* 16M alignment for crash kernel regions */
> -#define CRASH_ALIGN		SZ_16M
> -
>  /*
>   * Keep the crash kernel below this limit.
>   *
> -- 
> 2.25.1
>
Baoquan He Dec. 13, 2021, 1:37 p.m. UTC | #3
On 12/10/21 at 02:55pm, Zhen Lei wrote:
> From: Chen Zhou <chenzhou10@huawei.com>
> 
> The lower bounds of crash kernel reservation and crash kernel low
> reservation are different, use the consistent value CRASH_ALIGN.
> 
> Suggested-by: Dave Young <dyoung@redhat.com>
> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>

You may need add Co-developed-by to clarify who is author, and who is
co-author. Please check section "When to use Acked-by:, Cc:, and Co-developed-by:"
of Documentation/process/submitting-patches.rst. Otherwise, 

Acked-by: Baoquan He <bhe@redhat.com>

> Tested-by: John Donnelly <John.p.donnelly@oracle.com>
> Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>
> ---
>  arch/x86/kernel/setup.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 5cc60996eac56d6..6424ee4f23da2cf 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -441,7 +441,8 @@ static int __init reserve_crashkernel_low(void)
>  			return 0;
>  	}
>  
> -	low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX);
> +	low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN,
> +			CRASH_ADDR_LOW_MAX);
>  	if (!low_base) {
>  		pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
>  		       (unsigned long)(low_size >> 20));
> -- 
> 2.25.1
>
John Donnelly Dec. 13, 2021, 2:26 p.m. UTC | #4
On 12/10/21 12:55 AM, Zhen Lei wrote:
> From: Chen Zhou <chenzhou10@huawei.com>
> 
> Move CRASH_ALIGN to header asm/kexec.h for later use.
> 
> Suggested-by: Dave Young <dyoung@redhat.com>
> Suggested-by: Baoquan He <bhe@redhat.com>
> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> Tested-by: John Donnelly <John.p.donnelly@oracle.com>
> Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>
 >
  Acked-by: John Donnelly <john.p.donnelly@oracle.com>


> ---
>   arch/x86/include/asm/kexec.h | 3 +++
>   arch/x86/kernel/setup.c      | 3 ---
>   2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> index 11b7c06e2828c30..3a22e65262aa70b 100644
> --- a/arch/x86/include/asm/kexec.h
> +++ b/arch/x86/include/asm/kexec.h
> @@ -18,6 +18,9 @@
>   
>   # define KEXEC_CONTROL_CODE_MAX_SIZE	2048
>   
> +/* 16M alignment for crash kernel regions */
> +#define CRASH_ALIGN		SZ_16M
> +
>   #ifndef __ASSEMBLY__
>   
>   #include <linux/string.h>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 6a190c7f4d71b05..5cc60996eac56d6 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -392,9 +392,6 @@ static void __init memblock_x86_reserve_range_setup_data(void)
>   
>   #ifdef CONFIG_KEXEC_CORE
>   
> -/* 16M alignment for crash kernel regions */
> -#define CRASH_ALIGN		SZ_16M
> -
>   /*
>    * Keep the crash kernel below this limit.
>    *
John Donnelly Dec. 13, 2021, 2:27 p.m. UTC | #5
On 12/10/21 12:55 AM, Zhen Lei wrote:
> From: Chen Zhou <chenzhou10@huawei.com>
> 
> The lower bounds of crash kernel reservation and crash kernel low
> reservation are different, use the consistent value CRASH_ALIGN.
> 
> Suggested-by: Dave Young <dyoung@redhat.com>
> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> Tested-by: John Donnelly <John.p.donnelly@oracle.com>
> Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>

  Acked-by: John Donnelly <john.p.donnelly@oracle.com>

> ---
>   arch/x86/kernel/setup.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 5cc60996eac56d6..6424ee4f23da2cf 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -441,7 +441,8 @@ static int __init reserve_crashkernel_low(void)
>   			return 0;
>   	}
>   
> -	low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX);
> +	low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN,
> +			CRASH_ADDR_LOW_MAX);
>   	if (!low_base) {
>   		pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
>   		       (unsigned long)(low_size >> 20));
John Donnelly Dec. 13, 2021, 2:28 p.m. UTC | #6
On 12/10/21 12:55 AM, Zhen Lei wrote:
> From: Chen Zhou <chenzhou10@huawei.com>
> 
> To make the functions reserve_crashkernel() as generic,
> replace some hard-coded numbers with macro CRASH_ADDR_LOW_MAX.
> 
> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> Tested-by: John Donnelly <John.p.donnelly@oracle.com>
> Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>
> Acked-by: Baoquan He <bhe@redhat.com>

  Acked-by: John Donnelly <john.p.donnelly@oracle.com>

> ---
>   arch/x86/kernel/setup.c | 7 ++++---
>   1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 6424ee4f23da2cf..bb2a0973b98059e 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -489,8 +489,9 @@ static void __init reserve_crashkernel(void)
>   	if (!crash_base) {
>   		/*
>   		 * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
> -		 * crashkernel=x,high reserves memory over 4G, also allocates
> -		 * 256M extra low memory for DMA buffers and swiotlb.
> +		 * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
> +		 * also allocates 256M extra low memory for DMA buffers
> +		 * and swiotlb.
>   		 * But the extra memory is not required for all machines.
>   		 * So try low memory first and fall back to high memory
>   		 * unless "crashkernel=size[KMG],high" is specified.
> @@ -518,7 +519,7 @@ static void __init reserve_crashkernel(void)
>   		}
>   	}
>   
> -	if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
> +	if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
>   		memblock_phys_free(crash_base, crash_size);
>   		return;
>   	}
John Donnelly Dec. 13, 2021, 2:29 p.m. UTC | #7
On 12/10/21 12:55 AM, Zhen Lei wrote:
> From: Chen Zhou <chenzhou10@huawei.com>
> 
> We will make the functions reserve_crashkernel() as generic, the
> xen_pv_domain() check in reserve_crashkernel() is relevant only to
> x86, the same as insert_resource() in reserve_crashkernel[_low]().
> So move xen_pv_domain() check and insert_resource() to setup_arch()
> to keep them in x86.
> 
> Suggested-by: Mike Rapoport <rppt@kernel.org>
> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> Tested-by: John Donnelly <John.p.donnelly@oracle.com>
> Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>
> Acked-by: Baoquan He <bhe@redhat.com>

  Acked-by: John Donnelly <john.p.donnelly@oracle.com>

> ---
>   arch/x86/kernel/setup.c | 19 +++++++++++--------
>   1 file changed, 11 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index bb2a0973b98059e..7ae00716a208f82 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -456,7 +456,6 @@ static int __init reserve_crashkernel_low(void)
>   
>   	crashk_low_res.start = low_base;
>   	crashk_low_res.end   = low_base + low_size - 1;
> -	insert_resource(&iomem_resource, &crashk_low_res);
>   #endif
>   	return 0;
>   }
> @@ -480,11 +479,6 @@ static void __init reserve_crashkernel(void)
>   		high = true;
>   	}
>   
> -	if (xen_pv_domain()) {
> -		pr_info("Ignoring crashkernel for a Xen PV domain\n");
> -		return;
> -	}
> -
>   	/* 0 means: find the address automatically */
>   	if (!crash_base) {
>   		/*
> @@ -531,7 +525,6 @@ static void __init reserve_crashkernel(void)
>   
>   	crashk_res.start = crash_base;
>   	crashk_res.end   = crash_base + crash_size - 1;
> -	insert_resource(&iomem_resource, &crashk_res);
>   }
>   #else
>   static void __init reserve_crashkernel(void)
> @@ -1143,7 +1136,17 @@ void __init setup_arch(char **cmdline_p)
>   	 * Reserve memory for crash kernel after SRAT is parsed so that it
>   	 * won't consume hotpluggable memory.
>   	 */
> -	reserve_crashkernel();
> +	if (xen_pv_domain())
> +		pr_info("Ignoring crashkernel for a Xen PV domain\n");
> +	else {
> +		reserve_crashkernel();
> +#ifdef CONFIG_KEXEC_CORE
> +		if (crashk_res.end > crashk_res.start)
> +			insert_resource(&iomem_resource, &crashk_res);
> +		if (crashk_low_res.end > crashk_low_res.start)
> +			insert_resource(&iomem_resource, &crashk_low_res);
> +#endif
> +	}
>   
>   	memblock_find_dma_reserve();
>
John Donnelly Dec. 13, 2021, 2:30 p.m. UTC | #8
On 12/10/21 12:55 AM, Zhen Lei wrote:
> From: Chen Zhou <chenzhou10@huawei.com>
> 
> Make the functions reserve_crashkernel[_low]() as generic. Since
> reserve_crashkernel[_low]() implementations are quite similar on other
> architectures as well, we can have more users of this later.
> 
> So have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in arch/Kconfig and
> select this by X86.
> 
> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> Tested-by: John Donnelly <John.p.donnelly@oracle.com>
> Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>

  Acked-by: John Donnelly <john.p.donnelly@oracle.com>
> ---
>   arch/Kconfig                 |   3 +
>   arch/x86/Kconfig             |   2 +
>   arch/x86/include/asm/elf.h   |   3 +
>   arch/x86/include/asm/kexec.h |  28 ++++++-
>   arch/x86/kernel/setup.c      | 143 +-------------------------------
>   include/linux/crash_core.h   |   3 +
>   include/linux/kexec.h        |   2 -
>   kernel/crash_core.c          | 156 +++++++++++++++++++++++++++++++++++
>   kernel/kexec_core.c          |  17 ----
>   9 files changed, 194 insertions(+), 163 deletions(-)
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index d3c4ab249e9c275..7bdb32c41985dc5 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -24,6 +24,9 @@ config KEXEC_ELF
>   config HAVE_IMA_KEXEC
>   	bool
>   
> +config ARCH_WANT_RESERVE_CRASH_KERNEL
> +	bool
> +
>   config SET_FS
>   	bool
>   
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 5c2ccb85f2efb86..bd78ed8193079b9 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -12,6 +12,7 @@ config X86_32
>   	depends on !64BIT
>   	# Options that are inherently 32-bit kernel only:
>   	select ARCH_WANT_IPC_PARSE_VERSION
> +	select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE
>   	select CLKSRC_I8253
>   	select CLONE_BACKWARDS
>   	select GENERIC_VDSO_32
> @@ -28,6 +29,7 @@ config X86_64
>   	select ARCH_HAS_GIGANTIC_PAGE
>   	select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
>   	select ARCH_USE_CMPXCHG_LOCKREF
> +	select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE
>   	select HAVE_ARCH_SOFT_DIRTY
>   	select MODULES_USE_ELF_RELA
>   	select NEED_DMA_MAP_STATE
> diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
> index 29fea180a6658e8..7a6c36cff8331f5 100644
> --- a/arch/x86/include/asm/elf.h
> +++ b/arch/x86/include/asm/elf.h
> @@ -94,6 +94,9 @@ extern unsigned int vdso32_enabled;
>   
>   #define elf_check_arch(x)	elf_check_arch_ia32(x)
>   
> +/* We can also handle crash dumps from 64 bit kernel. */
> +# define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64)
> +
>   /* SVR4/i386 ABI (pages 3-31, 3-32) says that when the program starts %edx
>      contains a pointer to a function which might be registered using `atexit'.
>      This provides a mean for the dynamic linker to call DT_FINI functions for
> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> index 3a22e65262aa70b..3ff38a1353a2b86 100644
> --- a/arch/x86/include/asm/kexec.h
> +++ b/arch/x86/include/asm/kexec.h
> @@ -21,6 +21,27 @@
>   /* 16M alignment for crash kernel regions */
>   #define CRASH_ALIGN		SZ_16M
>   
> +/*
> + * Keep the crash kernel below this limit.
> + *
> + * Earlier 32-bits kernels would limit the kernel to the low 512 MB range
> + * due to mapping restrictions.
> + *
> + * 64-bit kdump kernels need to be restricted to be under 64 TB, which is
> + * the upper limit of system RAM in 4-level paging mode. Since the kdump
> + * jump could be from 5-level paging to 4-level paging, the jump will fail if
> + * the kernel is put above 64 TB, and during the 1st kernel bootup there's
> + * no good way to detect the paging mode of the target kernel which will be
> + * loaded for dumping.
> + */
> +#ifdef CONFIG_X86_32
> +# define CRASH_ADDR_LOW_MAX	SZ_512M
> +# define CRASH_ADDR_HIGH_MAX	SZ_512M
> +#else
> +# define CRASH_ADDR_LOW_MAX	SZ_4G
> +# define CRASH_ADDR_HIGH_MAX	SZ_64T
> +#endif
> +
>   #ifndef __ASSEMBLY__
>   
>   #include <linux/string.h>
> @@ -51,9 +72,6 @@ struct kimage;
>   
>   /* The native architecture */
>   # define KEXEC_ARCH KEXEC_ARCH_386
> -
> -/* We can also handle crash dumps from 64 bit kernel. */
> -# define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64)
>   #else
>   /* Maximum physical address we can use pages from */
>   # define KEXEC_SOURCE_MEMORY_LIMIT      (MAXMEM-1)
> @@ -195,6 +213,10 @@ typedef void crash_vmclear_fn(void);
>   extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
>   extern void kdump_nmi_shootdown_cpus(void);
>   
> +#ifdef CONFIG_KEXEC_CORE
> +extern void __init reserve_crashkernel(void);
> +#endif
> +
>   #endif /* __ASSEMBLY__ */
>   
>   #endif /* _ASM_X86_KEXEC_H */
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 7ae00716a208f82..5519baa7f4b964e 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -39,6 +39,7 @@
>   #include <asm/io_apic.h>
>   #include <asm/kasan.h>
>   #include <asm/kaslr.h>
> +#include <asm/kexec.h>
>   #include <asm/mce.h>
>   #include <asm/mtrr.h>
>   #include <asm/realmode.h>
> @@ -386,147 +387,7 @@ static void __init memblock_x86_reserve_range_setup_data(void)
>   	}
>   }
>   
> -/*
> - * --------- Crashkernel reservation ------------------------------
> - */
> -
> -#ifdef CONFIG_KEXEC_CORE
> -
> -/*
> - * Keep the crash kernel below this limit.
> - *
> - * Earlier 32-bits kernels would limit the kernel to the low 512 MB range
> - * due to mapping restrictions.
> - *
> - * 64-bit kdump kernels need to be restricted to be under 64 TB, which is
> - * the upper limit of system RAM in 4-level paging mode. Since the kdump
> - * jump could be from 5-level paging to 4-level paging, the jump will fail if
> - * the kernel is put above 64 TB, and during the 1st kernel bootup there's
> - * no good way to detect the paging mode of the target kernel which will be
> - * loaded for dumping.
> - */
> -#ifdef CONFIG_X86_32
> -# define CRASH_ADDR_LOW_MAX	SZ_512M
> -# define CRASH_ADDR_HIGH_MAX	SZ_512M
> -#else
> -# define CRASH_ADDR_LOW_MAX	SZ_4G
> -# define CRASH_ADDR_HIGH_MAX	SZ_64T
> -#endif
> -
> -static int __init reserve_crashkernel_low(void)
> -{
> -#ifdef CONFIG_X86_64
> -	unsigned long long base, low_base = 0, low_size = 0;
> -	unsigned long low_mem_limit;
> -	int ret;
> -
> -	low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX);
> -
> -	/* crashkernel=Y,low */
> -	ret = parse_crashkernel_low(boot_command_line, low_mem_limit, &low_size, &base);
> -	if (ret) {
> -		/*
> -		 * two parts from kernel/dma/swiotlb.c:
> -		 * -swiotlb size: user-specified with swiotlb= or default.
> -		 *
> -		 * -swiotlb overflow buffer: now hardcoded to 32k. We round it
> -		 * to 8M for other buffers that may need to stay low too. Also
> -		 * make sure we allocate enough extra low memory so that we
> -		 * don't run out of DMA buffers for 32-bit devices.
> -		 */
> -		low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
> -	} else {
> -		/* passed with crashkernel=0,low ? */
> -		if (!low_size)
> -			return 0;
> -	}
> -
> -	low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN,
> -			CRASH_ADDR_LOW_MAX);
> -	if (!low_base) {
> -		pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
> -		       (unsigned long)(low_size >> 20));
> -		return -ENOMEM;
> -	}
> -
> -	pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (low RAM limit: %ldMB)\n",
> -		(unsigned long)(low_size >> 20),
> -		(unsigned long)(low_base >> 20),
> -		(unsigned long)(low_mem_limit >> 20));
> -
> -	crashk_low_res.start = low_base;
> -	crashk_low_res.end   = low_base + low_size - 1;
> -#endif
> -	return 0;
> -}
> -
> -static void __init reserve_crashkernel(void)
> -{
> -	unsigned long long crash_size, crash_base, total_mem;
> -	bool high = false;
> -	int ret;
> -
> -	total_mem = memblock_phys_mem_size();
> -
> -	/* crashkernel=XM */
> -	ret = parse_crashkernel(boot_command_line, total_mem, &crash_size, &crash_base);
> -	if (ret != 0 || crash_size <= 0) {
> -		/* crashkernel=X,high */
> -		ret = parse_crashkernel_high(boot_command_line, total_mem,
> -					     &crash_size, &crash_base);
> -		if (ret != 0 || crash_size <= 0)
> -			return;
> -		high = true;
> -	}
> -
> -	/* 0 means: find the address automatically */
> -	if (!crash_base) {
> -		/*
> -		 * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
> -		 * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
> -		 * also allocates 256M extra low memory for DMA buffers
> -		 * and swiotlb.
> -		 * But the extra memory is not required for all machines.
> -		 * So try low memory first and fall back to high memory
> -		 * unless "crashkernel=size[KMG],high" is specified.
> -		 */
> -		if (!high)
> -			crash_base = memblock_phys_alloc_range(crash_size,
> -						CRASH_ALIGN, CRASH_ALIGN,
> -						CRASH_ADDR_LOW_MAX);
> -		if (!crash_base)
> -			crash_base = memblock_phys_alloc_range(crash_size,
> -						CRASH_ALIGN, CRASH_ALIGN,
> -						CRASH_ADDR_HIGH_MAX);
> -		if (!crash_base) {
> -			pr_info("crashkernel reservation failed - No suitable area found.\n");
> -			return;
> -		}
> -	} else {
> -		unsigned long long start;
> -
> -		start = memblock_phys_alloc_range(crash_size, SZ_1M, crash_base,
> -						  crash_base + crash_size);
> -		if (start != crash_base) {
> -			pr_info("crashkernel reservation failed - memory is in use.\n");
> -			return;
> -		}
> -	}
> -
> -	if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
> -		memblock_phys_free(crash_base, crash_size);
> -		return;
> -	}
> -
> -	pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n",
> -		(unsigned long)(crash_size >> 20),
> -		(unsigned long)(crash_base >> 20),
> -		(unsigned long)(total_mem >> 20));
> -
> -	crashk_res.start = crash_base;
> -	crashk_res.end   = crash_base + crash_size - 1;
> -}
> -#else
> +#ifndef CONFIG_KEXEC_CORE
>   static void __init reserve_crashkernel(void)
>   {
>   }
> diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
> index de62a722431e7db..f6b99da4ed08ecf 100644
> --- a/include/linux/crash_core.h
> +++ b/include/linux/crash_core.h
> @@ -73,6 +73,9 @@ extern unsigned char *vmcoreinfo_data;
>   extern size_t vmcoreinfo_size;
>   extern u32 *vmcoreinfo_note;
>   
> +extern struct resource crashk_res;
> +extern struct resource crashk_low_res;
> +
>   Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
>   			  void *data, size_t data_len);
>   void final_note(Elf_Word *buf);
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 0c994ae37729e1e..cd744d962f6f417 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -352,8 +352,6 @@ extern int kexec_load_disabled;
>   
>   /* Location of a reserved region to hold the crash kernel.
>    */
> -extern struct resource crashk_res;
> -extern struct resource crashk_low_res;
>   extern note_buf_t __percpu *crash_notes;
>   
>   /* flag to track if kexec reboot is in progress */
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index eb53f5ec62c900f..b23cfc0ca8905fd 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -8,6 +8,12 @@
>   #include <linux/crash_core.h>
>   #include <linux/utsname.h>
>   #include <linux/vmalloc.h>
> +#include <linux/memblock.h>
> +#include <linux/swiotlb.h>
> +
> +#ifdef CONFIG_KEXEC_CORE
> +#include <asm/kexec.h>
> +#endif
>   
>   #include <asm/page.h>
>   #include <asm/sections.h>
> @@ -22,6 +28,22 @@ u32 *vmcoreinfo_note;
>   /* trusted vmcoreinfo, e.g. we can make a copy in the crash memory */
>   static unsigned char *vmcoreinfo_data_safecopy;
>   
> +/* Location of the reserved area for the crash kernel */
> +struct resource crashk_res = {
> +	.name  = "Crash kernel",
> +	.start = 0,
> +	.end   = 0,
> +	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> +	.desc  = IORES_DESC_CRASH_KERNEL
> +};
> +struct resource crashk_low_res = {
> +	.name  = "Crash kernel",
> +	.start = 0,
> +	.end   = 0,
> +	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> +	.desc  = IORES_DESC_CRASH_KERNEL
> +};
> +
>   /*
>    * parsing the "crashkernel" commandline
>    *
> @@ -295,6 +317,140 @@ int __init parse_crashkernel_low(char *cmdline,
>   				"crashkernel=", suffix_tbl[SUFFIX_LOW]);
>   }
>   
> +/*
> + * --------- Crashkernel reservation ------------------------------
> + */
> +
> +#ifdef CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL
> +static int __init reserve_crashkernel_low(void)
> +{
> +#ifdef CONFIG_64BIT
> +	unsigned long long base, low_base = 0, low_size = 0;
> +	unsigned long low_mem_limit;
> +	int ret;
> +
> +	low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX);
> +
> +	/* crashkernel=Y,low */
> +	ret = parse_crashkernel_low(boot_command_line, low_mem_limit, &low_size, &base);
> +	if (ret) {
> +		/*
> +		 * two parts from kernel/dma/swiotlb.c:
> +		 * -swiotlb size: user-specified with swiotlb= or default.
> +		 *
> +		 * -swiotlb overflow buffer: now hardcoded to 32k. We round it
> +		 * to 8M for other buffers that may need to stay low too. Also
> +		 * make sure we allocate enough extra low memory so that we
> +		 * don't run out of DMA buffers for 32-bit devices.
> +		 */
> +		low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
> +	} else {
> +		/* passed with crashkernel=0,low ? */
> +		if (!low_size)
> +			return 0;
> +	}
> +
> +	low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN,
> +			CRASH_ADDR_LOW_MAX);
> +	if (!low_base) {
> +		pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
> +		       (unsigned long)(low_size >> 20));
> +		return -ENOMEM;
> +	}
> +
> +	pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (low RAM limit: %ldMB)\n",
> +		(unsigned long)(low_size >> 20),
> +		(unsigned long)(low_base >> 20),
> +		(unsigned long)(low_mem_limit >> 20));
> +
> +	crashk_low_res.start = low_base;
> +	crashk_low_res.end   = low_base + low_size - 1;
> +#endif
> +	return 0;
> +}
> +
> +/*
> + * reserve_crashkernel() - reserves memory for crash kernel
> + *
> + * This function reserves memory area given in "crashkernel=" kernel command
> + * line parameter. The memory reserved is used by dump capture kernel when
> + * primary kernel is crashing.
> + */
> +void __init reserve_crashkernel(void)
> +{
> +	unsigned long long crash_size, crash_base, total_mem;
> +	bool high = false;
> +	int ret;
> +
> +	total_mem = memblock_phys_mem_size();
> +
> +	/* crashkernel=XM */
> +	ret = parse_crashkernel(boot_command_line, total_mem, &crash_size, &crash_base);
> +	if (ret != 0 || crash_size <= 0) {
> +		/* crashkernel=X,high */
> +		ret = parse_crashkernel_high(boot_command_line, total_mem,
> +					     &crash_size, &crash_base);
> +		if (ret != 0 || crash_size <= 0)
> +			return;
> +		high = true;
> +	}
> +
> +	/* 0 means: find the address automatically */
> +	if (!crash_base) {
> +		/*
> +		 * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
> +		 * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
> +		 * also allocates 256M extra low memory for DMA buffers
> +		 * and swiotlb.
> +		 * But the extra memory is not required for all machines.
> +		 * So try low memory first and fall back to high memory
> +		 * unless "crashkernel=size[KMG],high" is specified.
> +		 */
> +		if (!high)
> +			crash_base = memblock_phys_alloc_range(crash_size,
> +						CRASH_ALIGN, CRASH_ALIGN,
> +						CRASH_ADDR_LOW_MAX);
> +		if (!crash_base)
> +			crash_base = memblock_phys_alloc_range(crash_size,
> +						CRASH_ALIGN, CRASH_ALIGN,
> +						CRASH_ADDR_HIGH_MAX);
> +		if (!crash_base) {
> +			pr_info("crashkernel reservation failed - No suitable area found.\n");
> +			return;
> +		}
> +	} else {
> +		/* User specifies base address explicitly. */
> +		unsigned long long start;
> +
> +		if (!IS_ALIGNED(crash_base, CRASH_ALIGN)) {
> +			pr_warn("cannot reserve crashkernel: base address is not %ldMB aligned\n",
> +				(unsigned long)CRASH_ALIGN >> 20);
> +			return;
> +		}
> +
> +		start = memblock_phys_alloc_range(crash_size, SZ_1M, crash_base,
> +						  crash_base + crash_size);
> +		if (start != crash_base) {
> +			pr_info("crashkernel reservation failed - memory is in use.\n");
> +			return;
> +		}
> +	}
> +
> +	if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
> +		memblock_phys_free(crash_base, crash_size);
> +		return;
> +	}
> +
> +	pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n",
> +		(unsigned long)(crash_size >> 20),
> +		(unsigned long)(crash_base >> 20),
> +		(unsigned long)(total_mem >> 20));
> +
> +	crashk_res.start = crash_base;
> +	crashk_res.end   = crash_base + crash_size - 1;
> +}
> +#endif /* CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL */
> +
>   Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
>   			  void *data, size_t data_len)
>   {
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 5a5d192a89ac307..1e0d4909bbb6b77 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -54,23 +54,6 @@ note_buf_t __percpu *crash_notes;
>   /* Flag to indicate we are going to kexec a new kernel */
>   bool kexec_in_progress = false;
>   
> -
> -/* Location of the reserved area for the crash kernel */
> -struct resource crashk_res = {
> -	.name  = "Crash kernel",
> -	.start = 0,
> -	.end   = 0,
> -	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> -	.desc  = IORES_DESC_CRASH_KERNEL
> -};
> -struct resource crashk_low_res = {
> -	.name  = "Crash kernel",
> -	.start = 0,
> -	.end   = 0,
> -	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> -	.desc  = IORES_DESC_CRASH_KERNEL
> -};
> -
>   int kexec_should_crash(struct task_struct *p)
>   {
>   	/*
John Donnelly Dec. 13, 2021, 2:30 p.m. UTC | #9
On 12/10/21 12:55 AM, Zhen Lei wrote:
> From: Chen Zhou <chenzhou10@huawei.com>
> 
> Introduce macro CRASH_ALIGN for alignment, macro CRASH_ADDR_LOW_MAX
> for upper bound of low crash memory, macro CRASH_ADDR_HIGH_MAX for
> upper bound of high crash memory, use macros instead.
> 
> Besides, keep consistent with x86, use CRASH_ALIGN as the lower bound
> of crash kernel reservation.
> 
> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> Tested-by: John Donnelly <John.p.donnelly@oracle.com>
> Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>

  Acked-by: John Donnelly <john.p.donnelly@oracle.com>

> ---
>   arch/arm64/include/asm/kexec.h | 6 ++++++
>   arch/arm64/mm/init.c           | 4 ++--
>   2 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
> index 9839bfc163d7147..1b9edc69f0244ca 100644
> --- a/arch/arm64/include/asm/kexec.h
> +++ b/arch/arm64/include/asm/kexec.h
> @@ -25,6 +25,12 @@
>   
>   #define KEXEC_ARCH KEXEC_ARCH_AARCH64
>   
> +/* 2M alignment for crash kernel regions */
> +#define CRASH_ALIGN	SZ_2M
> +
> +#define CRASH_ADDR_LOW_MAX	arm64_dma_phys_limit
> +#define CRASH_ADDR_HIGH_MAX	MEMBLOCK_ALLOC_ACCESSIBLE
> +
>   #ifndef __ASSEMBLY__
>   
>   /**
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index a8834434af99ae0..be4595dc7459115 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -75,7 +75,7 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init;
>   static void __init reserve_crashkernel(void)
>   {
>   	unsigned long long crash_base, crash_size;
> -	unsigned long long crash_max = arm64_dma_phys_limit;
> +	unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
>   	int ret;
>   
>   	ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> @@ -91,7 +91,7 @@ static void __init reserve_crashkernel(void)
>   		crash_max = crash_base + crash_size;
>   
>   	/* Current arm64 boot protocol requires 2MB alignment */
> -	crash_base = memblock_phys_alloc_range(crash_size, SZ_2M,
> +	crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
>   					       crash_base, crash_max);
>   	if (!crash_base) {
>   		pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
John Donnelly Dec. 13, 2021, 2:31 p.m. UTC | #10
On 12/10/21 12:55 AM, Zhen Lei wrote:
> From: Chen Zhou <chenzhou10@huawei.com>
> 
> There are following issues in arm64 kdump:
> 1. We use crashkernel=X to reserve crashkernel below 4G, which
> will fail when there is no enough low memory.
> 2. If reserving crashkernel above 4G, in this case, crash dump
> kernel will boot failure because there is no low memory available
> for allocation.
> 
> To solve these issues, change the behavior of crashkernel=X and
> introduce crashkernel=X,[high,low]. crashkernel=X tries low allocation
> in DMA zone, and fall back to high allocation if it fails.
> We can also use "crashkernel=X,high" to select a region above DMA zone,
> which also tries to allocate at least 256M in DMA zone automatically.
> "crashkernel=Y,low" can be used to allocate specified size low memory.
> 
> Another minor change, there may be two regions reserved for crash
> dump kernel, in order to distinct from the high region and make no
> effect to the use of existing kexec-tools, rename the low region as
> "Crash kernel (low)".
> 
> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> Tested-by: John Donnelly <John.p.donnelly@oracle.com>
> Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>

  Acked-by: John Donnelly <john.p.donnelly@oracle.com>

> ---
>   arch/arm64/Kconfig                     |  1 +
>   arch/arm64/include/asm/kexec.h         |  4 ++
>   arch/arm64/kernel/machine_kexec_file.c | 12 +++++-
>   arch/arm64/kernel/setup.c              | 13 +++++-
>   arch/arm64/mm/init.c                   | 59 +++++---------------------
>   5 files changed, 38 insertions(+), 51 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index c4207cf9bb17ffb..4b99efa36da3793 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -95,6 +95,7 @@ config ARM64
>   	select ARCH_WANT_FRAME_POINTERS
>   	select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36)
>   	select ARCH_WANT_LD_ORPHAN_WARN
> +	select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE
>   	select ARCH_WANTS_NO_INSTR
>   	select ARCH_HAS_UBSAN_SANITIZE_ALL
>   	select ARM_AMBA
> diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
> index 1b9edc69f0244ca..3bde0079925d771 100644
> --- a/arch/arm64/include/asm/kexec.h
> +++ b/arch/arm64/include/asm/kexec.h
> @@ -96,6 +96,10 @@ static inline void crash_prepare_suspend(void) {}
>   static inline void crash_post_resume(void) {}
>   #endif
>   
> +#ifdef CONFIG_KEXEC_CORE
> +extern void __init reserve_crashkernel(void);
> +#endif
> +
>   #if defined(CONFIG_KEXEC_CORE)
>   void cpu_soft_restart(unsigned long el2_switch, unsigned long entry,
>   		      unsigned long arg0, unsigned long arg1,
> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> index 63634b4d72c158f..6f3fa059ca4e816 100644
> --- a/arch/arm64/kernel/machine_kexec_file.c
> +++ b/arch/arm64/kernel/machine_kexec_file.c
> @@ -65,10 +65,18 @@ static int prepare_elf_headers(void **addr, unsigned long *sz)
>   
>   	/* Exclude crashkernel region */
>   	ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end);
> +	if (ret)
> +		goto out;
> +
> +	if (crashk_low_res.end) {
> +		ret = crash_exclude_mem_range(cmem, crashk_low_res.start, crashk_low_res.end);
> +		if (ret)
> +			goto out;
> +	}
>   
> -	if (!ret)
> -		ret =  crash_prepare_elf64_headers(cmem, true, addr, sz);
> +	ret = crash_prepare_elf64_headers(cmem, true, addr, sz);
>   
> +out:
>   	kfree(cmem);
>   	return ret;
>   }
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index be5f85b0a24de69..4bb2e55366be64d 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -248,7 +248,18 @@ static void __init request_standard_resources(void)
>   		    kernel_data.end <= res->end)
>   			request_resource(res, &kernel_data);
>   #ifdef CONFIG_KEXEC_CORE
> -		/* Userspace will find "Crash kernel" region in /proc/iomem. */
> +		/*
> +		 * Userspace will find "Crash kernel" or "Crash kernel (low)"
> +		 * region in /proc/iomem.
> +		 * In order to distinct from the high region and make no effect
> +		 * to the use of existing kexec-tools, rename the low region as
> +		 * "Crash kernel (low)".
> +		 */
> +		if (crashk_low_res.end && crashk_low_res.start >= res->start &&
> +				crashk_low_res.end <= res->end) {
> +			crashk_low_res.name = "Crash kernel (low)";
> +			request_resource(res, &crashk_low_res);
> +		}
>   		if (crashk_res.end && crashk_res.start >= res->start &&
>   		    crashk_res.end <= res->end)
>   			request_resource(res, &crashk_res);
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index be4595dc7459115..85c83e4eff2b6c4 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -36,6 +36,7 @@
>   #include <asm/fixmap.h>
>   #include <asm/kasan.h>
>   #include <asm/kernel-pgtable.h>
> +#include <asm/kexec.h>
>   #include <asm/kvm_host.h>
>   #include <asm/memory.h>
>   #include <asm/numa.h>
> @@ -64,57 +65,11 @@ EXPORT_SYMBOL(memstart_addr);
>    */
>   phys_addr_t arm64_dma_phys_limit __ro_after_init;
>   
> -#ifdef CONFIG_KEXEC_CORE
> -/*
> - * reserve_crashkernel() - reserves memory for crash kernel
> - *
> - * This function reserves memory area given in "crashkernel=" kernel command
> - * line parameter. The memory reserved is used by dump capture kernel when
> - * primary kernel is crashing.
> - */
> +#ifndef CONFIG_KEXEC_CORE
>   static void __init reserve_crashkernel(void)
>   {
> -	unsigned long long crash_base, crash_size;
> -	unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
> -	int ret;
> -
> -	ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> -				&crash_size, &crash_base);
> -	/* no crashkernel= or invalid value specified */
> -	if (ret || !crash_size)
> -		return;
> -
> -	crash_size = PAGE_ALIGN(crash_size);
> -
> -	/* User specifies base address explicitly. */
> -	if (crash_base)
> -		crash_max = crash_base + crash_size;
> -
> -	/* Current arm64 boot protocol requires 2MB alignment */
> -	crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
> -					       crash_base, crash_max);
> -	if (!crash_base) {
> -		pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
> -			crash_size);
> -		return;
> -	}
> -
> -	pr_info("crashkernel reserved: 0x%016llx - 0x%016llx (%lld MB)\n",
> -		crash_base, crash_base + crash_size, crash_size >> 20);
> -
> -	/*
> -	 * The crashkernel memory will be removed from the kernel linear
> -	 * map. Inform kmemleak so that it won't try to access it.
> -	 */
> -	kmemleak_ignore_phys(crash_base);
> -	crashk_res.start = crash_base;
> -	crashk_res.end = crash_base + crash_size - 1;
>   }
> -#else
> -static void __init reserve_crashkernel(void)
> -{
> -}
> -#endif /* CONFIG_KEXEC_CORE */
> +#endif
>   
>   /*
>    * Return the maximum physical address for a zone accessible by the given bits
> @@ -362,6 +317,14 @@ void __init bootmem_init(void)
>   	 * reserved, so do it here.
>   	 */
>   	reserve_crashkernel();
> +#ifdef CONFIG_KEXEC_CORE
> +	/*
> +	 * The low region is intended to be used for crash dump kernel devices,
> +	 * just mark the low region as "nomap" simply.
> +	 */
> +	if (crashk_low_res.end)
> +		memblock_mark_nomap(crashk_low_res.start, resource_size(&crashk_low_res));
> +#endif
>   
>   	memblock_dump_all();
>   }
John Donnelly Dec. 13, 2021, 2:34 p.m. UTC | #11
On 12/10/21 12:55 AM, Zhen Lei wrote:
> From: Chen Zhou <chenzhou10@huawei.com>
> 
> For arm64, the behavior of crashkernel=X has been changed, which
> tries low allocation in DMA zone and fall back to high allocation
> if it fails.
> 
> We can also use "crashkernel=X,high" to select a high region above
> DMA zone, which also tries to allocate at least 256M low memory in
> DMA zone automatically and "crashkernel=Y,low" can be used to allocate
> specified size low memory.
> 
> So update the Documentation.
> 
> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>

  Acked-by: John Donnelly <john.p.donnelly@oracle.com>

> ---
>   Documentation/admin-guide/kdump/kdump.rst       | 11 +++++++++--
>   Documentation/admin-guide/kernel-parameters.txt | 11 +++++++++--
>   2 files changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst
> index cb30ca3df27c9b2..d4c287044be0c70 100644
> --- a/Documentation/admin-guide/kdump/kdump.rst
> +++ b/Documentation/admin-guide/kdump/kdump.rst
> @@ -361,8 +361,15 @@ Boot into System Kernel
>      kernel will automatically locate the crash kernel image within the
>      first 512MB of RAM if X is not given.
>   
> -   On arm64, use "crashkernel=Y[@X]".  Note that the start address of
> -   the kernel, X if explicitly specified, must be aligned to 2MiB (0x200000).
> +   On arm64, use "crashkernel=X" to try low allocation in DMA zone and
> +   fall back to high allocation if it fails.
> +   We can also use "crashkernel=X,high" to select a high region above
> +   DMA zone, which also tries to allocate at least 256M low memory in
> +   DMA zone automatically.
> +   "crashkernel=Y,low" can be used to allocate specified size low memory.
> +   Use "crashkernel=Y@X" if you really have to reserve memory from
> +   specified start address X. Note that the start address of the kernel,
> +   X if explicitly specified, must be aligned to 2MiB (0x200000).
>   
>   Load the Dump-capture Kernel
>   ============================
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 9725c546a0d46db..91f3a8dc537d404 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -783,6 +783,9 @@
>   			[KNL, X86-64] Select a region under 4G first, and
>   			fall back to reserve region above 4G when '@offset'
>   			hasn't been specified.
> +			[KNL, ARM64] Try low allocation in DMA zone and fall back
> +			to high allocation if it fails when '@offset' hasn't been
> +			specified.
>   			See Documentation/admin-guide/kdump/kdump.rst for further details.
>   
>   	crashkernel=range1:size1[,range2:size2,...][@offset]
> @@ -799,6 +802,8 @@
>   			Otherwise memory region will be allocated below 4G, if
>   			available.
>   			It will be ignored if crashkernel=X is specified.
> +			[KNL, ARM64] range in high memory.
> +			Allow kernel to allocate physical memory region from top.
>   	crashkernel=size[KMG],low
>   			[KNL, X86-64] range under 4G. When crashkernel=X,high
>   			is passed, kernel could allocate physical memory region
> @@ -807,13 +812,15 @@
>   			requires at least 64M+32K low memory, also enough extra
>   			low memory is needed to make sure DMA buffers for 32-bit
>   			devices won't run out. Kernel would try to allocate at
> -			at least 256M below 4G automatically.
> +			least 256M below 4G automatically.
>   			This one let user to specify own low range under 4G
>   			for second kernel instead.
>   			0: to disable low allocation.
>   			It will be ignored when crashkernel=X,high is not used
>   			or memory reserved is below 4G.
> -
> +			[KNL, ARM64] range in low memory.
> +			This one let user to specify a low range in DMA zone for
> +			crash dump kernel.
>   	cryptomgr.notests
>   			[KNL] Disable crypto self-tests
>
John Donnelly Dec. 13, 2021, 2:37 p.m. UTC | #12
On 12/10/21 12:55 AM, Zhen Lei wrote:
> There are following issues in arm64 kdump:
> 1. We use crashkernel=X to reserve crashkernel below 4G, which
> will fail when there is no enough low memory.
> 2. If reserving crashkernel above 4G, in this case, crash dump
> kernel will boot failure because there is no low memory available
> for allocation.
> 
> To solve these issues, change the behavior of crashkernel=X.
> crashkernel=X tries low allocation in DMA zone and fall back to high
> allocation if it fails.
> 
> We can also use "crashkernel=X,high" to select a high region above
> DMA zone, which also tries to allocate at least 256M low memory in
> DMA zone automatically and "crashkernel=Y,low" can be used to allocate
> specified size low memory.
> 
> When reserving crashkernel in high memory, some low memory is reserved
> for crash dump kernel devices. So there may be two regions reserved for
> crash dump kernel.
> In order to distinct from the high region and make no effect to the use
> of existing kexec-tools, rename the low region as "Crash kernel (low)",
> and pass the low region by reusing DT property
> "linux,usable-memory-range". We made the low memory region as the last
> range of "linux,usable-memory-range" to keep compatibility with existing
> user-space and older kdump kernels.
> 
> Besides, we need to modify kexec-tools:
> arm64: support more than one crash kernel regions(see [1])
> 
> Another update is document about DT property 'linux,usable-memory-range':
> schemas: update 'linux,usable-memory-range' node schema(see [2])
> 
> This patchset contains the following 10 patches:
> 
> 0001-0004 are some x86 cleanups which prepares for making functionsreserve_crashkernel[_low]() generic.
> 0005 makes functions reserve_crashkernel[_low]() generic.
> 0006-0007 reimplements arm64 crashkernel=X.
> 0008-0009 adds memory for devices by DT property linux,usable-memory-range.
> 0010 updates the doc.
> 
> Changes since [v16]
> - Because no functional changes in this version, so add
>    "Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>" for patch 1-9
> - Add "Reviewed-by: Rob Herring <robh@kernel.org>" for patch 8
> - Update patch 9 based on the review comments of Rob Herring
> - As Catalin Marinas's suggestion, merge the implementation of
>    ARCH_WANT_RESERVE_CRASH_KERNEL into patch 5. Ensure that the
>    contents of X86 and ARM64 do not overlap, and reduce unnecessary
>    temporary differences.
> 
> Changes since [v15]
> -  Aggregate the processing of "linux,usable-memory-range" into one function.
>     Only patch 9-10 have been updated.
> 
> Changes since [v14]
> - Recovering the requirement that the CrashKernel memory regions on X86
>    only requires 1 MiB alignment.
> - Combine patches 5 and 6 in v14 into one. The compilation warning fixed
>    by patch 6 was introduced by patch 5 in v14.
> - As with crashk_res, crashk_low_res is also processed by
>    crash_exclude_mem_range() in patch 7.
> - Due to commit b261dba2fdb2 ("arm64: kdump: Remove custom linux,usable-memory-range handling")
>    has removed the architecture-specific code, extend the property "linux,usable-memory-range"
>    in the platform-agnostic FDT core code. See patch 9.
> - Discard the x86 description update in the document, because the description
>    has been updated by commit b1f4c363666c ("Documentation: kdump: update kdump guide").
> - Change "arm64" to "ARM64" in Doc.
> 
> 
> Changes since [v13]
> - Rebased on top of 5.11-rc5.
> - Introduce config CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL.
> Since reserve_crashkernel[_low]() implementations are quite similar on
> other architectures, so have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in
> arch/Kconfig and select this by X86 and ARM64.
> - Some minor cleanup.
> 
> Changes since [v12]
> - Rebased on top of 5.10-rc1.
> - Keep CRASH_ALIGN as 16M suggested by Dave.
> - Drop patch "kdump: add threshold for the required memory".
> - Add Tested-by from John.
> 
> Changes since [v11]
> - Rebased on top of 5.9-rc4.
> - Make the function reserve_crashkernel() of x86 generic.
> Suggested by Catalin, make the function reserve_crashkernel() of x86 generic
> and arm64 use the generic version to reimplement crashkernel=X.
> 
> Changes since [v10]
> - Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.
> 
> Changes since [v9]
> - Patch 1 add Acked-by from Dave.
> - Update patch 5 according to Dave's comments.
> - Update chosen schema.
> 
> Changes since [v8]
> - Reuse DT property "linux,usable-memory-range".
> Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low
> memory region.
> - Fix kdump broken with ZONE_DMA reintroduced.
> - Update chosen schema.
> 
> Changes since [v7]
> - Move x86 CRASH_ALIGN to 2M
> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
> - Update Documentation/devicetree/bindings/chosen.txt.
> Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt
> suggested by Arnd.
> - Add Tested-by from Jhon and pk.
> 
> Changes since [v6]
> - Fix build errors reported by kbuild test robot.
> 
> Changes since [v5]
> - Move reserve_crashkernel_low() into kernel/crash_core.c.
> - Delete crashkernel=X,high.
> - Modify crashkernel=X,low.
> If crashkernel=X,low is specified simultaneously, reserve spcified size low
> memory for crash kdump kernel devices firstly and then reserve memory above 4G.
> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
> pass to crash dump kernel by DT property "linux,low-memory-range".
> - Update Documentation/admin-guide/kdump/kdump.rst.
> 
> Changes since [v4]
> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
> 
> Changes since [v3]
> - Add memblock_cap_memory_ranges back for multiple ranges.
> - Fix some compiling warnings.
> 
> Changes since [v2]
> - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
> two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
> patch.
> 
> Changes since [v1]:
> - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
> - Remove memblock_cap_memory_ranges() i added in v1 and implement that
> in fdt_enforce_memory_region().
> There are at most two crash kernel regions, for two crash kernel regions
> case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
> and then remove the memory range in the middle.
> 
> [1]: http://lists.infradead.org/pipermail/kexec/2020-June/020737.html
> [2]: https://github.com/robherring/dt-schema/pull/19
> [v1]: https://lkml.org/lkml/2019/4/2/1174
> [v2]: https://lkml.org/lkml/2019/4/9/86
> [v3]: https://lkml.org/lkml/2019/4/9/306
> [v4]: https://lkml.org/lkml/2019/4/15/273
> [v5]: https://lkml.org/lkml/2019/5/6/1360
> [v6]: https://lkml.org/lkml/2019/8/30/142
> [v7]: https://lkml.org/lkml/2019/12/23/411
> [v8]: https://lkml.org/lkml/2020/5/21/213
> [v9]: https://lkml.org/lkml/2020/6/28/73
> [v10]: https://lkml.org/lkml/2020/7/2/1443
> [v11]: https://lkml.org/lkml/2020/8/1/150
> [v12]: https://lkml.org/lkml/2020/9/7/1037
> [v13]: https://lkml.org/lkml/2020/10/31/34
> [v14]: https://lkml.org/lkml/2021/1/30/53
> [v15]: https://lkml.org/lkml/2021/10/19/1405
> [v16]: https://lkml.org/lkml/2021/11/23/435
> 
> 
> Chen Zhou (9):
>    x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN
>    x86: kdump: make the lower bound of crash kernel reservation
>      consistent
>    x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions
>      reserve_crashkernel()
>    x86: kdump: move xen_pv_domain() check and insert_resource() to
>      setup_arch()
>    x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
>    arm64: kdump: introduce some macros for crash kernel reservation
>    arm64: kdump: reimplement crashkernel=X
>    of: fdt: Add memory for devices by DT property
>      "linux,usable-memory-range"
>    kdump: update Documentation about crashkernel
> 
> Zhen Lei (1):
>    of: fdt: Aggregate the processing of "linux,usable-memory-range"
> 
>   Documentation/admin-guide/kdump/kdump.rst     |  11 +-
>   .../admin-guide/kernel-parameters.txt         |  11 +-
>   arch/Kconfig                                  |   3 +
>   arch/arm64/Kconfig                            |   1 +
>   arch/arm64/include/asm/kexec.h                |  10 ++
>   arch/arm64/kernel/machine_kexec_file.c        |  12 +-
>   arch/arm64/kernel/setup.c                     |  13 +-
>   arch/arm64/mm/init.c                          |  59 ++-----
>   arch/x86/Kconfig                              |   2 +
>   arch/x86/include/asm/elf.h                    |   3 +
>   arch/x86/include/asm/kexec.h                  |  31 +++-
>   arch/x86/kernel/setup.c                       | 163 ++----------------
>   drivers/of/fdt.c                              |  42 +++--
>   include/linux/crash_core.h                    |   3 +
>   include/linux/kexec.h                         |   2 -
>   kernel/crash_core.c                           | 156 +++++++++++++++++
>   kernel/kexec_core.c                           |  17 --
>   17 files changed, 301 insertions(+), 238 deletions(-)
> 


Hello ,

After 2 years, and 17 versions, can we now get this series promoted into 
a build ?


Thank you,

JD
Will Deacon Dec. 13, 2021, 6:50 p.m. UTC | #13
On Fri, Dec 10, 2021 at 03:15:00PM +0800, Kefeng Wang wrote:
> 
> On 2021/12/10 14:55, Zhen Lei wrote:
> > There are following issues in arm64 kdump:
> > 1. We use crashkernel=X to reserve crashkernel below 4G, which
> > will fail when there is no enough low memory.
> > 2. If reserving crashkernel above 4G, in this case, crash dump
> > kernel will boot failure because there is no low memory available
> > for allocation.
> > 
> > To solve these issues, change the behavior of crashkernel=X.
> > crashkernel=X tries low allocation in DMA zone and fall back to high
> > allocation if it fails.
> > 
> > We can also use "crashkernel=X,high" to select a high region above
> > DMA zone, which also tries to allocate at least 256M low memory in
> > DMA zone automatically and "crashkernel=Y,low" can be used to allocate
> > specified size low memory.
> > 
> > When reserving crashkernel in high memory, some low memory is reserved
> > for crash dump kernel devices. So there may be two regions reserved for
> > crash dump kernel.
> > In order to distinct from the high region and make no effect to the use
> > of existing kexec-tools, rename the low region as "Crash kernel (low)",
> > and pass the low region by reusing DT property
> > "linux,usable-memory-range". We made the low memory region as the last
> > range of "linux,usable-memory-range" to keep compatibility with existing
> > user-space and older kdump kernels.
> > 
> > Besides, we need to modify kexec-tools:
> > arm64: support more than one crash kernel regions(see [1])
> > 
> > Another update is document about DT property 'linux,usable-memory-range':
> > schemas: update 'linux,usable-memory-range' node schema(see [2])
> > 
> > This patchset contains the following 10 patches:
> > 
> > 0001-0004 are some x86 cleanups which prepares for making functionsreserve_crashkernel[_low]() generic.
> > 0005 makes functions reserve_crashkernel[_low]() generic.
> > 0006-0007 reimplements arm64 crashkernel=X.
> > 0008-0009 adds memory for devices by DT property linux,usable-memory-range.
> > 0010 updates the doc.
> > 
> > Changes since [v16]
> > - Because no functional changes in this version, so add
> >    "Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>" for patch 1-9
> > - Add "Reviewed-by: Rob Herring <robh@kernel.org>" for patch 8
> > - Update patch 9 based on the review comments of Rob Herring
> > - As Catalin Marinas's suggestion, merge the implementation of
> >    ARCH_WANT_RESERVE_CRASH_KERNEL into patch 5. Ensure that the
> >    contents of X86 and ARM64 do not overlap, and reduce unnecessary
> >    temporary differences.
> 
> An Internal review has been done, so for this series,
> 
> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>

That's good, but it would be _much_ better if you could do these reviews
on the public mailing list in future. Is that possible? Otherwise, it's
hard for maintainers to know what the reviews actually covered.

Thanks,

Will
Catalin Marinas Dec. 13, 2021, 6:56 p.m. UTC | #14
On Mon, Dec 13, 2021 at 08:37:48AM -0600, john.p.donnelly@oracle.com wrote:
> On 12/10/21 12:55 AM, Zhen Lei wrote:
> > There are following issues in arm64 kdump:
> > 1. We use crashkernel=X to reserve crashkernel below 4G, which
> > will fail when there is no enough low memory.
> > 2. If reserving crashkernel above 4G, in this case, crash dump
> > kernel will boot failure because there is no low memory available
> > for allocation.
> > 
> > To solve these issues, change the behavior of crashkernel=X.
> > crashkernel=X tries low allocation in DMA zone and fall back to high
> > allocation if it fails.
> > 
> > We can also use "crashkernel=X,high" to select a high region above
> > DMA zone, which also tries to allocate at least 256M low memory in
> > DMA zone automatically and "crashkernel=Y,low" can be used to allocate
> > specified size low memory.
> > 
> > When reserving crashkernel in high memory, some low memory is reserved
> > for crash dump kernel devices. So there may be two regions reserved for
> > crash dump kernel.
> > In order to distinct from the high region and make no effect to the use
> > of existing kexec-tools, rename the low region as "Crash kernel (low)",
> > and pass the low region by reusing DT property
> > "linux,usable-memory-range". We made the low memory region as the last
> > range of "linux,usable-memory-range" to keep compatibility with existing
> > user-space and older kdump kernels.
> > 
> > Besides, we need to modify kexec-tools:
> > arm64: support more than one crash kernel regions(see [1])
> > 
> > Another update is document about DT property 'linux,usable-memory-range':
> > schemas: update 'linux,usable-memory-range' node schema(see [2])
> > 
> > This patchset contains the following 10 patches:
> > 
> > 0001-0004 are some x86 cleanups which prepares for making functionsreserve_crashkernel[_low]() generic.
> > 0005 makes functions reserve_crashkernel[_low]() generic.
> > 0006-0007 reimplements arm64 crashkernel=X.
> > 0008-0009 adds memory for devices by DT property linux,usable-memory-range.
> > 0010 updates the doc.
> > 
> > Changes since [v16]
> > - Because no functional changes in this version, so add
> >    "Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>" for patch 1-9
> > - Add "Reviewed-by: Rob Herring <robh@kernel.org>" for patch 8
> > - Update patch 9 based on the review comments of Rob Herring
> > - As Catalin Marinas's suggestion, merge the implementation of
> >    ARCH_WANT_RESERVE_CRASH_KERNEL into patch 5. Ensure that the
> >    contents of X86 and ARM64 do not overlap, and reduce unnecessary
> >    temporary differences.
> > 
> > Changes since [v15]
> > -  Aggregate the processing of "linux,usable-memory-range" into one function.
> >     Only patch 9-10 have been updated.
> > 
> > Changes since [v14]
> > - Recovering the requirement that the CrashKernel memory regions on X86
> >    only requires 1 MiB alignment.
> > - Combine patches 5 and 6 in v14 into one. The compilation warning fixed
> >    by patch 6 was introduced by patch 5 in v14.
> > - As with crashk_res, crashk_low_res is also processed by
> >    crash_exclude_mem_range() in patch 7.
> > - Due to commit b261dba2fdb2 ("arm64: kdump: Remove custom linux,usable-memory-range handling")
> >    has removed the architecture-specific code, extend the property "linux,usable-memory-range"
> >    in the platform-agnostic FDT core code. See patch 9.
> > - Discard the x86 description update in the document, because the description
> >    has been updated by commit b1f4c363666c ("Documentation: kdump: update kdump guide").
> > - Change "arm64" to "ARM64" in Doc.
> > 
> > 
> > Changes since [v13]
> > - Rebased on top of 5.11-rc5.
> > - Introduce config CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL.
> > Since reserve_crashkernel[_low]() implementations are quite similar on
> > other architectures, so have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in
> > arch/Kconfig and select this by X86 and ARM64.
> > - Some minor cleanup.
> > 
> > Changes since [v12]
> > - Rebased on top of 5.10-rc1.
> > - Keep CRASH_ALIGN as 16M suggested by Dave.
> > - Drop patch "kdump: add threshold for the required memory".
> > - Add Tested-by from John.
> > 
> > Changes since [v11]
> > - Rebased on top of 5.9-rc4.
> > - Make the function reserve_crashkernel() of x86 generic.
> > Suggested by Catalin, make the function reserve_crashkernel() of x86 generic
> > and arm64 use the generic version to reimplement crashkernel=X.
> > 
> > Changes since [v10]
> > - Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.
> > 
> > Changes since [v9]
> > - Patch 1 add Acked-by from Dave.
> > - Update patch 5 according to Dave's comments.
> > - Update chosen schema.
> > 
> > Changes since [v8]
> > - Reuse DT property "linux,usable-memory-range".
> > Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low
> > memory region.
> > - Fix kdump broken with ZONE_DMA reintroduced.
> > - Update chosen schema.
> > 
> > Changes since [v7]
> > - Move x86 CRASH_ALIGN to 2M
> > Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
> > - Update Documentation/devicetree/bindings/chosen.txt.
> > Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt
> > suggested by Arnd.
> > - Add Tested-by from Jhon and pk.
> > 
> > Changes since [v6]
> > - Fix build errors reported by kbuild test robot.
> > 
> > Changes since [v5]
> > - Move reserve_crashkernel_low() into kernel/crash_core.c.
> > - Delete crashkernel=X,high.
> > - Modify crashkernel=X,low.
> > If crashkernel=X,low is specified simultaneously, reserve spcified size low
> > memory for crash kdump kernel devices firstly and then reserve memory above 4G.
> > In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
> > pass to crash dump kernel by DT property "linux,low-memory-range".
> > - Update Documentation/admin-guide/kdump/kdump.rst.
> > 
> > Changes since [v4]
> > - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
> > 
> > Changes since [v3]
> > - Add memblock_cap_memory_ranges back for multiple ranges.
> > - Fix some compiling warnings.
> > 
> > Changes since [v2]
> > - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
> > two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
> > patch.
> > 
> > Changes since [v1]:
> > - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
> > - Remove memblock_cap_memory_ranges() i added in v1 and implement that
> > in fdt_enforce_memory_region().
> > There are at most two crash kernel regions, for two crash kernel regions
> > case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
> > and then remove the memory range in the middle.
> > 
> > [1]: http://lists.infradead.org/pipermail/kexec/2020-June/020737.html
> > [2]: https://github.com/robherring/dt-schema/pull/19
> > [v1]: https://lkml.org/lkml/2019/4/2/1174
> > [v2]: https://lkml.org/lkml/2019/4/9/86
> > [v3]: https://lkml.org/lkml/2019/4/9/306
> > [v4]: https://lkml.org/lkml/2019/4/15/273
> > [v5]: https://lkml.org/lkml/2019/5/6/1360
> > [v6]: https://lkml.org/lkml/2019/8/30/142
> > [v7]: https://lkml.org/lkml/2019/12/23/411
> > [v8]: https://lkml.org/lkml/2020/5/21/213
> > [v9]: https://lkml.org/lkml/2020/6/28/73
> > [v10]: https://lkml.org/lkml/2020/7/2/1443
> > [v11]: https://lkml.org/lkml/2020/8/1/150
> > [v12]: https://lkml.org/lkml/2020/9/7/1037
> > [v13]: https://lkml.org/lkml/2020/10/31/34
> > [v14]: https://lkml.org/lkml/2021/1/30/53
> > [v15]: https://lkml.org/lkml/2021/10/19/1405
> > [v16]: https://lkml.org/lkml/2021/11/23/435
> > 
> > 
> > Chen Zhou (9):
> >    x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN
> >    x86: kdump: make the lower bound of crash kernel reservation
> >      consistent
> >    x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions
> >      reserve_crashkernel()
> >    x86: kdump: move xen_pv_domain() check and insert_resource() to
> >      setup_arch()
> >    x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
> >    arm64: kdump: introduce some macros for crash kernel reservation
> >    arm64: kdump: reimplement crashkernel=X
> >    of: fdt: Add memory for devices by DT property
> >      "linux,usable-memory-range"
> >    kdump: update Documentation about crashkernel
> > 
> > Zhen Lei (1):
> >    of: fdt: Aggregate the processing of "linux,usable-memory-range"
> > 
> >   Documentation/admin-guide/kdump/kdump.rst     |  11 +-
> >   .../admin-guide/kernel-parameters.txt         |  11 +-
> >   arch/Kconfig                                  |   3 +
> >   arch/arm64/Kconfig                            |   1 +
> >   arch/arm64/include/asm/kexec.h                |  10 ++
> >   arch/arm64/kernel/machine_kexec_file.c        |  12 +-
> >   arch/arm64/kernel/setup.c                     |  13 +-
> >   arch/arm64/mm/init.c                          |  59 ++-----
> >   arch/x86/Kconfig                              |   2 +
> >   arch/x86/include/asm/elf.h                    |   3 +
> >   arch/x86/include/asm/kexec.h                  |  31 +++-
> >   arch/x86/kernel/setup.c                       | 163 ++----------------
> >   drivers/of/fdt.c                              |  42 +++--
> >   include/linux/crash_core.h                    |   3 +
> >   include/linux/kexec.h                         |   2 -
> >   kernel/crash_core.c                           | 156 +++++++++++++++++
> >   kernel/kexec_core.c                           |  17 --
> >   17 files changed, 301 insertions(+), 238 deletions(-)
> 
> After 2 years, and 17 versions, can we now get this series promoted into a
> build ?

We don't push patches to linux-next while there are still pending
issues. The v17 looks alright though, I'll push it out for a bit more
coverage and, if there are no objections, it should land in 5.17.
Borislav Petkov Dec. 13, 2021, 6:57 p.m. UTC | #15
On Mon, Dec 13, 2021 at 08:37:48AM -0600, john.p.donnelly@oracle.com wrote:
> After 2 years, and 17 versions, can we now get this series promoted into a
> build ?

For example:

$ ./scripts/get_maintainer.pl -f Documentation/admin-guide/kdump/kdump.rst
Baoquan He <bhe@redhat.com> (maintainer:KDUMP)
Vivek Goyal <vgoyal@redhat.com> (reviewer:KDUMP)
Dave Young <dyoung@redhat.com> (reviewer:KDUMP)
Jonathan Corbet <corbet@lwn.net> (maintainer:DOCUMENTATION)
kexec@lists.infradead.org (open list:KDUMP)

I see only two acks from Baoquan.

So yes, this needs to go through the normal review process first.
Borislav Petkov Dec. 13, 2021, 7:54 p.m. UTC | #16
> Subject: Re: [PATCH v17 01/10] x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN

From Documentation/process/maintainer-tip.rst:

"Patch subject
 ^^^^^^^^^^^^^

The tip tree preferred format for patch subject prefixes is
'subsys/component:', e.g. 'x86/apic:', 'x86/mm/fault:', 'sched/fair:',
'genirq/core:'. Please do not use file names or complete file paths as
prefix. 'git log path/to/file' should give you a reasonable hint in most
cases.

The condensed patch description in the subject line should start with a
uppercase letter and should be written in imperative tone."

Please fix 1-5 for your next submission.

Thx.
Leizhen (ThunderTown) Dec. 14, 2021, 8:41 a.m. UTC | #17
On 2021/12/13 21:17, Baoquan He wrote:
> On 12/10/21 at 02:55pm, Zhen Lei wrote:
>> From: Chen Zhou <chenzhou10@huawei.com>
>>
>> Move CRASH_ALIGN to header asm/kexec.h for later use.
>>
>> Suggested-by: Dave Young <dyoung@redhat.com>
>> Suggested-by: Baoquan He <bhe@redhat.com>
> 
> I remember Dave and I discussed and suggested this when reviewing.
> You can remove my Suggested-by.

OK, I will do it.

> 
> For this one, I would like to add ack:
> 
> Acked-by: Baoquan He <bhe@redhat.com>
> 
>> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>> Tested-by: John Donnelly <John.p.donnelly@oracle.com>
>> Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>
>> ---
>>  arch/x86/include/asm/kexec.h | 3 +++
>>  arch/x86/kernel/setup.c      | 3 ---
>>  2 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
>> index 11b7c06e2828c30..3a22e65262aa70b 100644
>> --- a/arch/x86/include/asm/kexec.h
>> +++ b/arch/x86/include/asm/kexec.h
>> @@ -18,6 +18,9 @@
>>  
>>  # define KEXEC_CONTROL_CODE_MAX_SIZE	2048
>>  
>> +/* 16M alignment for crash kernel regions */
>> +#define CRASH_ALIGN		SZ_16M
>> +
>>  #ifndef __ASSEMBLY__
>>  
>>  #include <linux/string.h>
>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> index 6a190c7f4d71b05..5cc60996eac56d6 100644
>> --- a/arch/x86/kernel/setup.c
>> +++ b/arch/x86/kernel/setup.c
>> @@ -392,9 +392,6 @@ static void __init memblock_x86_reserve_range_setup_data(void)
>>  
>>  #ifdef CONFIG_KEXEC_CORE
>>  
>> -/* 16M alignment for crash kernel regions */
>> -#define CRASH_ALIGN		SZ_16M
>> -
>>  /*
>>   * Keep the crash kernel below this limit.
>>   *
>> -- 
>> 2.25.1
>>
> 
> .
>
Leizhen (ThunderTown) Dec. 14, 2021, 8:48 a.m. UTC | #18
On 2021/12/13 21:37, Baoquan He wrote:
> On 12/10/21 at 02:55pm, Zhen Lei wrote:
>> From: Chen Zhou <chenzhou10@huawei.com>
>>
>> The lower bounds of crash kernel reservation and crash kernel low
>> reservation are different, use the consistent value CRASH_ALIGN.
>>
>> Suggested-by: Dave Young <dyoung@redhat.com>
>> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> 
> You may need add Co-developed-by to clarify who is author, and who is
> co-author. Please check section "When to use Acked-by:, Cc:, and Co-developed-by:"
> of Documentation/process/submitting-patches.rst. Otherwise, 

Okay, thanks for the heads-up. I will modify it.

> 
> Acked-by: Baoquan He <bhe@redhat.com>
> 
>> Tested-by: John Donnelly <John.p.donnelly@oracle.com>
>> Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>
>> ---
>>  arch/x86/kernel/setup.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> index 5cc60996eac56d6..6424ee4f23da2cf 100644
>> --- a/arch/x86/kernel/setup.c
>> +++ b/arch/x86/kernel/setup.c
>> @@ -441,7 +441,8 @@ static int __init reserve_crashkernel_low(void)
>>  			return 0;
>>  	}
>>  
>> -	low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX);
>> +	low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN,
>> +			CRASH_ADDR_LOW_MAX);
>>  	if (!low_base) {
>>  		pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
>>  		       (unsigned long)(low_size >> 20));
>> -- 
>> 2.25.1
>>
> 
> .
>
Baoquan He Dec. 14, 2021, 8:54 a.m. UTC | #19
On 12/10/21 at 02:55pm, Zhen Lei wrote:
> From: Chen Zhou <chenzhou10@huawei.com>
> 
> To make the functions reserve_crashkernel() as generic,
> replace some hard-coded numbers with macro CRASH_ADDR_LOW_MAX.
> 
> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>

If you made change to this patch, please remove the old Acked-by. If you
didn't contribute change, Signed-off-by should be taken off.

Compared this with the version I acked, only see
memblock_free() -> memblock_phys_free() update which should be done
from the rebase.

So ack this one again, and please also consider adding Co-developed-by.

> Tested-by: John Donnelly <John.p.donnelly@oracle.com>
> Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>
> Acked-by: Baoquan He <bhe@redhat.com>
> ---
>  arch/x86/kernel/setup.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 6424ee4f23da2cf..bb2a0973b98059e 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -489,8 +489,9 @@ static void __init reserve_crashkernel(void)
>  	if (!crash_base) {
>  		/*
>  		 * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
> -		 * crashkernel=x,high reserves memory over 4G, also allocates
> -		 * 256M extra low memory for DMA buffers and swiotlb.
> +		 * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
> +		 * also allocates 256M extra low memory for DMA buffers
> +		 * and swiotlb.
>  		 * But the extra memory is not required for all machines.
>  		 * So try low memory first and fall back to high memory
>  		 * unless "crashkernel=size[KMG],high" is specified.
> @@ -518,7 +519,7 @@ static void __init reserve_crashkernel(void)
>  		}
>  	}
>  
> -	if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
> +	if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
>  		memblock_phys_free(crash_base, crash_size);
>  		return;
>  	}
> -- 
> 2.25.1
>
Leizhen (ThunderTown) Dec. 14, 2021, 9:27 a.m. UTC | #20
On 2021/12/14 3:54, Borislav Petkov wrote:
>> Subject: Re: [PATCH v17 01/10] x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN
> 
>>From Documentation/process/maintainer-tip.rst:
> 
> "Patch subject
>  ^^^^^^^^^^^^^
> 
> The tip tree preferred format for patch subject prefixes is
> 'subsys/component:', e.g. 'x86/apic:', 'x86/mm/fault:', 'sched/fair:',
> 'genirq/core:'. Please do not use file names or complete file paths as
> prefix. 'git log path/to/file' should give you a reasonable hint in most
> cases.
> 
> The condensed patch description in the subject line should start with a
> uppercase letter and should be written in imperative tone."
> 
> Please fix 1-5 for your next submission.

OK. I will update them.

> 
> Thx.
>
Borislav Petkov Dec. 14, 2021, 9:38 a.m. UTC | #21
On Tue, Dec 14, 2021 at 04:54:40PM +0800, Baoquan He wrote:
> If you didn't contribute change, Signed-off-by should be taken off.

The second SOB is correct here. I'll let you figure it out what it
means.

Hint: Documentation/process/submitting-patches.rst
Baoquan He Dec. 14, 2021, 9:56 a.m. UTC | #22
On 12/14/21 at 10:38am, Borislav Petkov wrote:
> On Tue, Dec 14, 2021 at 04:54:40PM +0800, Baoquan He wrote:
> > If you didn't contribute change, Signed-off-by should be taken off.
> 
> The second SOB is correct here. I'll let you figure it out what it
> means.
> 
> Hint: Documentation/process/submitting-patches.rst

Ah, OK, I see the new paragraph from you added in below commit. 

commit 9bf19b78a203b6ed20ed7b5d7222f5ef7a49aed4
Author: Borislav Petkov <bp@alien8.de>
Date:   Thu Dec 17 19:37:56 2020 +0100

    Documentation/submitting-patches: Document the SoB chain

Hi Lei,

I take back the wrong comment on Signed-off-by tag since you post the
patch, please ignore them.
Baoquan He Dec. 14, 2021, 10:45 a.m. UTC | #23
On 12/10/21 at 02:55pm, Zhen Lei wrote:
> From: Chen Zhou <chenzhou10@huawei.com>
> 
> Make the functions reserve_crashkernel[_low]() as generic. Since
> reserve_crashkernel[_low]() implementations are quite similar on other
> architectures as well, we can have more users of this later.
> 
> So have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in arch/Kconfig and
> select this by X86.
> 
> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> Tested-by: John Donnelly <John.p.donnelly@oracle.com>
> Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>
> ---
>  arch/Kconfig                 |   3 +
>  arch/x86/Kconfig             |   2 +
>  arch/x86/include/asm/elf.h   |   3 +
>  arch/x86/include/asm/kexec.h |  28 ++++++-
>  arch/x86/kernel/setup.c      | 143 +-------------------------------
>  include/linux/crash_core.h   |   3 +
>  include/linux/kexec.h        |   2 -
>  kernel/crash_core.c          | 156 +++++++++++++++++++++++++++++++++++
>  kernel/kexec_core.c          |  17 ----
>  9 files changed, 194 insertions(+), 163 deletions(-)
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index d3c4ab249e9c275..7bdb32c41985dc5 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -24,6 +24,9 @@ config KEXEC_ELF
>  config HAVE_IMA_KEXEC
>  	bool
>  
> +config ARCH_WANT_RESERVE_CRASH_KERNEL
> +	bool
> +
>  config SET_FS
>  	bool
>  
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 5c2ccb85f2efb86..bd78ed8193079b9 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -12,6 +12,7 @@ config X86_32
>  	depends on !64BIT
>  	# Options that are inherently 32-bit kernel only:
>  	select ARCH_WANT_IPC_PARSE_VERSION
> +	select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE
>  	select CLKSRC_I8253
>  	select CLONE_BACKWARDS
>  	select GENERIC_VDSO_32
> @@ -28,6 +29,7 @@ config X86_64
>  	select ARCH_HAS_GIGANTIC_PAGE
>  	select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
>  	select ARCH_USE_CMPXCHG_LOCKREF
> +	select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE
>  	select HAVE_ARCH_SOFT_DIRTY
>  	select MODULES_USE_ELF_RELA
>  	select NEED_DMA_MAP_STATE
> diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
> index 29fea180a6658e8..7a6c36cff8331f5 100644
> --- a/arch/x86/include/asm/elf.h
> +++ b/arch/x86/include/asm/elf.h
> @@ -94,6 +94,9 @@ extern unsigned int vdso32_enabled;
>  
>  #define elf_check_arch(x)	elf_check_arch_ia32(x)
>  
> +/* We can also handle crash dumps from 64 bit kernel. */
> +# define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64)
> +
>  /* SVR4/i386 ABI (pages 3-31, 3-32) says that when the program starts %edx
>     contains a pointer to a function which might be registered using `atexit'.
>     This provides a mean for the dynamic linker to call DT_FINI functions for
> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> index 3a22e65262aa70b..3ff38a1353a2b86 100644
> --- a/arch/x86/include/asm/kexec.h
> +++ b/arch/x86/include/asm/kexec.h
> @@ -21,6 +21,27 @@
>  /* 16M alignment for crash kernel regions */
>  #define CRASH_ALIGN		SZ_16M
>  
> +/*
> + * Keep the crash kernel below this limit.
> + *
> + * Earlier 32-bits kernels would limit the kernel to the low 512 MB range
> + * due to mapping restrictions.
> + *
> + * 64-bit kdump kernels need to be restricted to be under 64 TB, which is
> + * the upper limit of system RAM in 4-level paging mode. Since the kdump
> + * jump could be from 5-level paging to 4-level paging, the jump will fail if
> + * the kernel is put above 64 TB, and during the 1st kernel bootup there's
> + * no good way to detect the paging mode of the target kernel which will be
> + * loaded for dumping.
> + */
> +#ifdef CONFIG_X86_32
> +# define CRASH_ADDR_LOW_MAX	SZ_512M
> +# define CRASH_ADDR_HIGH_MAX	SZ_512M
> +#else
> +# define CRASH_ADDR_LOW_MAX	SZ_4G
> +# define CRASH_ADDR_HIGH_MAX	SZ_64T
> +#endif
> +
>  #ifndef __ASSEMBLY__
>  
>  #include <linux/string.h>
> @@ -51,9 +72,6 @@ struct kimage;
>  
>  /* The native architecture */
>  # define KEXEC_ARCH KEXEC_ARCH_386
> -
> -/* We can also handle crash dumps from 64 bit kernel. */
> -# define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64)
>  #else
>  /* Maximum physical address we can use pages from */
>  # define KEXEC_SOURCE_MEMORY_LIMIT      (MAXMEM-1)
> @@ -195,6 +213,10 @@ typedef void crash_vmclear_fn(void);
>  extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
>  extern void kdump_nmi_shootdown_cpus(void);
>  
> +#ifdef CONFIG_KEXEC_CORE
> +extern void __init reserve_crashkernel(void);
> +#endif
> +
>  #endif /* __ASSEMBLY__ */
>  
>  #endif /* _ASM_X86_KEXEC_H */
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 7ae00716a208f82..5519baa7f4b964e 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -39,6 +39,7 @@
>  #include <asm/io_apic.h>
>  #include <asm/kasan.h>
>  #include <asm/kaslr.h>
> +#include <asm/kexec.h>
>  #include <asm/mce.h>
>  #include <asm/mtrr.h>
>  #include <asm/realmode.h>
> @@ -386,147 +387,7 @@ static void __init memblock_x86_reserve_range_setup_data(void)
>  	}
>  }
>  
> -/*
> - * --------- Crashkernel reservation ------------------------------
> - */
> -
> -#ifdef CONFIG_KEXEC_CORE
> -
> -/*
> - * Keep the crash kernel below this limit.
> - *
> - * Earlier 32-bits kernels would limit the kernel to the low 512 MB range
> - * due to mapping restrictions.
> - *
> - * 64-bit kdump kernels need to be restricted to be under 64 TB, which is
> - * the upper limit of system RAM in 4-level paging mode. Since the kdump
> - * jump could be from 5-level paging to 4-level paging, the jump will fail if
> - * the kernel is put above 64 TB, and during the 1st kernel bootup there's
> - * no good way to detect the paging mode of the target kernel which will be
> - * loaded for dumping.
> - */
> -#ifdef CONFIG_X86_32
> -# define CRASH_ADDR_LOW_MAX	SZ_512M
> -# define CRASH_ADDR_HIGH_MAX	SZ_512M
> -#else
> -# define CRASH_ADDR_LOW_MAX	SZ_4G
> -# define CRASH_ADDR_HIGH_MAX	SZ_64T
> -#endif
> -
> -static int __init reserve_crashkernel_low(void)
> -{
> -#ifdef CONFIG_X86_64
> -	unsigned long long base, low_base = 0, low_size = 0;
> -	unsigned long low_mem_limit;
> -	int ret;
> -
> -	low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX);
> -
> -	/* crashkernel=Y,low */
> -	ret = parse_crashkernel_low(boot_command_line, low_mem_limit, &low_size, &base);
> -	if (ret) {
> -		/*
> -		 * two parts from kernel/dma/swiotlb.c:
> -		 * -swiotlb size: user-specified with swiotlb= or default.
> -		 *
> -		 * -swiotlb overflow buffer: now hardcoded to 32k. We round it
> -		 * to 8M for other buffers that may need to stay low too. Also
> -		 * make sure we allocate enough extra low memory so that we
> -		 * don't run out of DMA buffers for 32-bit devices.
> -		 */
> -		low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
> -	} else {
> -		/* passed with crashkernel=0,low ? */
> -		if (!low_size)
> -			return 0;
> -	}
> -
> -	low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN,
> -			CRASH_ADDR_LOW_MAX);
> -	if (!low_base) {
> -		pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
> -		       (unsigned long)(low_size >> 20));
> -		return -ENOMEM;
> -	}
> -
> -	pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (low RAM limit: %ldMB)\n",
> -		(unsigned long)(low_size >> 20),
> -		(unsigned long)(low_base >> 20),
> -		(unsigned long)(low_mem_limit >> 20));
> -
> -	crashk_low_res.start = low_base;
> -	crashk_low_res.end   = low_base + low_size - 1;
> -#endif
> -	return 0;
> -}
> -
> -static void __init reserve_crashkernel(void)
> -{
> -	unsigned long long crash_size, crash_base, total_mem;
> -	bool high = false;
> -	int ret;
> -
> -	total_mem = memblock_phys_mem_size();
> -
> -	/* crashkernel=XM */
> -	ret = parse_crashkernel(boot_command_line, total_mem, &crash_size, &crash_base);
> -	if (ret != 0 || crash_size <= 0) {
> -		/* crashkernel=X,high */
> -		ret = parse_crashkernel_high(boot_command_line, total_mem,
> -					     &crash_size, &crash_base);
> -		if (ret != 0 || crash_size <= 0)
> -			return;
> -		high = true;
> -	}
> -
> -	/* 0 means: find the address automatically */
> -	if (!crash_base) {
> -		/*
> -		 * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
> -		 * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
> -		 * also allocates 256M extra low memory for DMA buffers
> -		 * and swiotlb.
> -		 * But the extra memory is not required for all machines.
> -		 * So try low memory first and fall back to high memory
> -		 * unless "crashkernel=size[KMG],high" is specified.
> -		 */
> -		if (!high)
> -			crash_base = memblock_phys_alloc_range(crash_size,
> -						CRASH_ALIGN, CRASH_ALIGN,
> -						CRASH_ADDR_LOW_MAX);
> -		if (!crash_base)
> -			crash_base = memblock_phys_alloc_range(crash_size,
> -						CRASH_ALIGN, CRASH_ALIGN,
> -						CRASH_ADDR_HIGH_MAX);
> -		if (!crash_base) {
> -			pr_info("crashkernel reservation failed - No suitable area found.\n");
> -			return;
> -		}
> -	} else {
> -		unsigned long long start;
> -
> -		start = memblock_phys_alloc_range(crash_size, SZ_1M, crash_base,
> -						  crash_base + crash_size);
> -		if (start != crash_base) {
> -			pr_info("crashkernel reservation failed - memory is in use.\n");
> -			return;
> -		}
> -	}
> -
> -	if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
> -		memblock_phys_free(crash_base, crash_size);
> -		return;
> -	}
> -
> -	pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n",
> -		(unsigned long)(crash_size >> 20),
> -		(unsigned long)(crash_base >> 20),
> -		(unsigned long)(total_mem >> 20));
> -
> -	crashk_res.start = crash_base;
> -	crashk_res.end   = crash_base + crash_size - 1;
> -}
> -#else
> +#ifndef CONFIG_KEXEC_CORE
>  static void __init reserve_crashkernel(void)
>  {
>  }
> diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
> index de62a722431e7db..f6b99da4ed08ecf 100644
> --- a/include/linux/crash_core.h
> +++ b/include/linux/crash_core.h
> @@ -73,6 +73,9 @@ extern unsigned char *vmcoreinfo_data;
>  extern size_t vmcoreinfo_size;
>  extern u32 *vmcoreinfo_note;
>  
> +extern struct resource crashk_res;
> +extern struct resource crashk_low_res;
> +
>  Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
>  			  void *data, size_t data_len);
>  void final_note(Elf_Word *buf);
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 0c994ae37729e1e..cd744d962f6f417 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -352,8 +352,6 @@ extern int kexec_load_disabled;
>  
>  /* Location of a reserved region to hold the crash kernel.
>   */
> -extern struct resource crashk_res;
> -extern struct resource crashk_low_res;
>  extern note_buf_t __percpu *crash_notes;
>  
>  /* flag to track if kexec reboot is in progress */
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index eb53f5ec62c900f..b23cfc0ca8905fd 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -8,6 +8,12 @@
>  #include <linux/crash_core.h>
>  #include <linux/utsname.h>
>  #include <linux/vmalloc.h>
> +#include <linux/memblock.h>
> +#include <linux/swiotlb.h>
> +
> +#ifdef CONFIG_KEXEC_CORE
> +#include <asm/kexec.h>
> +#endif
>  
>  #include <asm/page.h>
>  #include <asm/sections.h>
> @@ -22,6 +28,22 @@ u32 *vmcoreinfo_note;
>  /* trusted vmcoreinfo, e.g. we can make a copy in the crash memory */
>  static unsigned char *vmcoreinfo_data_safecopy;
>  
> +/* Location of the reserved area for the crash kernel */
> +struct resource crashk_res = {
> +	.name  = "Crash kernel",
> +	.start = 0,
> +	.end   = 0,
> +	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> +	.desc  = IORES_DESC_CRASH_KERNEL
> +};
> +struct resource crashk_low_res = {
> +	.name  = "Crash kernel",
> +	.start = 0,
> +	.end   = 0,
> +	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> +	.desc  = IORES_DESC_CRASH_KERNEL
> +};
> +
>  /*
>   * parsing the "crashkernel" commandline
>   *
> @@ -295,6 +317,140 @@ int __init parse_crashkernel_low(char *cmdline,
>  				"crashkernel=", suffix_tbl[SUFFIX_LOW]);
>  }
>  
> +/*
> + * --------- Crashkernel reservation ------------------------------
> + */
> +
> +#ifdef CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL
> +static int __init reserve_crashkernel_low(void)
> +{
> +#ifdef CONFIG_64BIT
> +	unsigned long long base, low_base = 0, low_size = 0;
> +	unsigned long low_mem_limit;
> +	int ret;
> +
> +	low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX);
> +
> +	/* crashkernel=Y,low */
> +	ret = parse_crashkernel_low(boot_command_line, low_mem_limit, &low_size, &base);
> +	if (ret) {
> +		/*
> +		 * two parts from kernel/dma/swiotlb.c:
> +		 * -swiotlb size: user-specified with swiotlb= or default.
> +		 *
> +		 * -swiotlb overflow buffer: now hardcoded to 32k. We round it
> +		 * to 8M for other buffers that may need to stay low too. Also
> +		 * make sure we allocate enough extra low memory so that we
> +		 * don't run out of DMA buffers for 32-bit devices.
> +		 */
> +		low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
> +	} else {
> +		/* passed with crashkernel=0,low ? */
> +		if (!low_size)
> +			return 0;
> +	}
> +
> +	low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN,
> +			CRASH_ADDR_LOW_MAX);
> +	if (!low_base) {
> +		pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
> +		       (unsigned long)(low_size >> 20));
> +		return -ENOMEM;
> +	}
> +
> +	pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (low RAM limit: %ldMB)\n",
> +		(unsigned long)(low_size >> 20),
> +		(unsigned long)(low_base >> 20),
> +		(unsigned long)(low_mem_limit >> 20));
> +
> +	crashk_low_res.start = low_base;
> +	crashk_low_res.end   = low_base + low_size - 1;
> +#endif
> +	return 0;
> +}
> +
> +/*
> + * reserve_crashkernel() - reserves memory for crash kernel
> + *
> + * This function reserves memory area given in "crashkernel=" kernel command
> + * line parameter. The memory reserved is used by dump capture kernel when
> + * primary kernel is crashing.
> + */
> +void __init reserve_crashkernel(void)
> +{
> +	unsigned long long crash_size, crash_base, total_mem;
> +	bool high = false;
> +	int ret;
> +
> +	total_mem = memblock_phys_mem_size();
> +
> +	/* crashkernel=XM */
> +	ret = parse_crashkernel(boot_command_line, total_mem, &crash_size, &crash_base);
> +	if (ret != 0 || crash_size <= 0) {
> +		/* crashkernel=X,high */
> +		ret = parse_crashkernel_high(boot_command_line, total_mem,
> +					     &crash_size, &crash_base);
> +		if (ret != 0 || crash_size <= 0)
> +			return;
> +		high = true;
> +	}
> +
> +	/* 0 means: find the address automatically */
> +	if (!crash_base) {
> +		/*
> +		 * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
> +		 * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
> +		 * also allocates 256M extra low memory for DMA buffers
> +		 * and swiotlb.
> +		 * But the extra memory is not required for all machines.
> +		 * So try low memory first and fall back to high memory
> +		 * unless "crashkernel=size[KMG],high" is specified.
> +		 */
> +		if (!high)
> +			crash_base = memblock_phys_alloc_range(crash_size,
> +						CRASH_ALIGN, CRASH_ALIGN,
> +						CRASH_ADDR_LOW_MAX);
> +		if (!crash_base)
> +			crash_base = memblock_phys_alloc_range(crash_size,
> +						CRASH_ALIGN, CRASH_ALIGN,
> +						CRASH_ADDR_HIGH_MAX);
> +		if (!crash_base) {
> +			pr_info("crashkernel reservation failed - No suitable area found.\n");
> +			return;
> +		}
> +	} else {
> +		/* User specifies base address explicitly. */
If you plan to repost, please take above sentence off either. Then we
can say this patch is only doing code moving.

> +		unsigned long long start;
> +

OK, I can see that this patch is only moving code, and introducing
CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL to wrap them appropriately, no
extra functionality change added or removed, except of this place.
An alignment checking is added for the user specified base address.
I love this checking. While I have to say it will be more perfect if
it's put in another small patch, that will be look much better from
patch splitting and reviewing point of view.

This whole patch looks great to me, thanks for the effort.


> +		if (!IS_ALIGNED(crash_base, CRASH_ALIGN)) {
> +			pr_warn("cannot reserve crashkernel: base address is not %ldMB aligned\n",
> +				(unsigned long)CRASH_ALIGN >> 20);
> +			return;
> +		}
> +
> +		start = memblock_phys_alloc_range(crash_size, SZ_1M, crash_base,
> +						  crash_base + crash_size);
> +		if (start != crash_base) {
> +			pr_info("crashkernel reservation failed - memory is in use.\n");
> +			return;
> +		}
> +	}
> +
> +	if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
> +		memblock_phys_free(crash_base, crash_size);
> +		return;
> +	}
> +
> +	pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n",
> +		(unsigned long)(crash_size >> 20),
> +		(unsigned long)(crash_base >> 20),
> +		(unsigned long)(total_mem >> 20));
> +
> +	crashk_res.start = crash_base;
> +	crashk_res.end   = crash_base + crash_size - 1;
> +}
> +#endif /* CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL */
> +
>  Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
>  			  void *data, size_t data_len)
>  {
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 5a5d192a89ac307..1e0d4909bbb6b77 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -54,23 +54,6 @@ note_buf_t __percpu *crash_notes;
>  /* Flag to indicate we are going to kexec a new kernel */
>  bool kexec_in_progress = false;
>  
> -
> -/* Location of the reserved area for the crash kernel */
> -struct resource crashk_res = {
> -	.name  = "Crash kernel",
> -	.start = 0,
> -	.end   = 0,
> -	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> -	.desc  = IORES_DESC_CRASH_KERNEL
> -};
> -struct resource crashk_low_res = {
> -	.name  = "Crash kernel",
> -	.start = 0,
> -	.end   = 0,
> -	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> -	.desc  = IORES_DESC_CRASH_KERNEL
> -};
> -
>  int kexec_should_crash(struct task_struct *p)
>  {
>  	/*
> -- 
> 2.25.1
>
Leizhen (ThunderTown) Dec. 14, 2021, 11:40 a.m. UTC | #24
On 2021/12/10 14:55, Zhen Lei wrote:
> From: Chen Zhou <chenzhou10@huawei.com>
> 
> We will make the functions reserve_crashkernel() as generic, the
> xen_pv_domain() check in reserve_crashkernel() is relevant only to
> x86, the same as insert_resource() in reserve_crashkernel[_low]().
> So move xen_pv_domain() check and insert_resource() to setup_arch()
> to keep them in x86.
> 
> Suggested-by: Mike Rapoport <rppt@kernel.org>
> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> Tested-by: John Donnelly <John.p.donnelly@oracle.com>
> Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>
> Acked-by: Baoquan He <bhe@redhat.com>
> ---
>  arch/x86/kernel/setup.c | 19 +++++++++++--------
>  1 file changed, 11 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index bb2a0973b98059e..7ae00716a208f82 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -456,7 +456,6 @@ static int __init reserve_crashkernel_low(void)
>  
>  	crashk_low_res.start = low_base;
>  	crashk_low_res.end   = low_base + low_size - 1;
> -	insert_resource(&iomem_resource, &crashk_low_res);
>  #endif
>  	return 0;
>  }
> @@ -480,11 +479,6 @@ static void __init reserve_crashkernel(void)
>  		high = true;
>  	}
>  
> -	if (xen_pv_domain()) {
> -		pr_info("Ignoring crashkernel for a Xen PV domain\n");
> -		return;
> -	}
> -
>  	/* 0 means: find the address automatically */
>  	if (!crash_base) {
>  		/*
> @@ -531,7 +525,6 @@ static void __init reserve_crashkernel(void)
>  
>  	crashk_res.start = crash_base;
>  	crashk_res.end   = crash_base + crash_size - 1;
> -	insert_resource(&iomem_resource, &crashk_res);
>  }
>  #else
>  static void __init reserve_crashkernel(void)
> @@ -1143,7 +1136,17 @@ void __init setup_arch(char **cmdline_p)
>  	 * Reserve memory for crash kernel after SRAT is parsed so that it
>  	 * won't consume hotpluggable memory.
>  	 */
> -	reserve_crashkernel();

Hi Baoquan:
  How about move "#ifdef CONFIG_KEXEC_CORE" here, so that we can remove the
empty reserve_crashkernel(). In fact, xen_pv_domain() is invoked only
when CONFIG_KEXEC_CORE is enabled before.

> +	if (xen_pv_domain())
> +		pr_info("Ignoring crashkernel for a Xen PV domain\n");
> +	else {
> +		reserve_crashkernel();
> +#ifdef CONFIG_KEXEC_CORE
> +		if (crashk_res.end > crashk_res.start)
> +			insert_resource(&iomem_resource, &crashk_res);
> +		if (crashk_low_res.end > crashk_low_res.start)
> +			insert_resource(&iomem_resource, &crashk_low_res);
> +#endif
> +	}
>  
>  	memblock_find_dma_reserve();
>  
>
Leizhen (ThunderTown) Dec. 14, 2021, 12:38 p.m. UTC | #25
On 2021/12/14 18:45, Baoquan He wrote:
>> +		/* User specifies base address explicitly. */
> If you plan to repost, please take above sentence off either. Then we
> can say this patch is only doing code moving.
> 
>> +		unsigned long long start;
>> +
> OK, I can see that this patch is only moving code, and introducing
> CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL to wrap them appropriately, no
> extra functionality change added or removed, except of this place.
> An alignment checking is added for the user specified base address.
> I love this checking. While I have to say it will be more perfect if
> it's put in another small patch, that will be look much better from
> patch splitting and reviewing point of view.

Good eye. I will put it in a new patch.

> 
> This whole patch looks great to me, thanks for the effort.
> 
>
Borislav Petkov Dec. 14, 2021, 2:24 p.m. UTC | #26
On Tue, Dec 14, 2021 at 05:56:57PM +0800, Baoquan He wrote:
> Ah, OK, I see the new paragraph from you added in below commit.

That is supposed to make it absolutely clear and explicit. There are
other hints as to what a subsequent SOB means in that document already.

Just the next section says:

"The Signed-off-by: tag indicates that the signer was involved in the
development of the patch, or that he/she was in the patch's delivery
path."

It wouldn't hurt if people would look at that doc from time to time - we
end up referring to it on a daily basis.
Borislav Petkov Dec. 14, 2021, 7:07 p.m. UTC | #27
On Fri, Dec 10, 2021 at 02:55:25PM +0800, Zhen Lei wrote:
> From: Chen Zhou <chenzhou10@huawei.com>
> 
> The lower bounds of crash kernel reservation and crash kernel low
> reservation are different, use the consistent value CRASH_ALIGN.

A big WHY is missing here to explain why the lower bound of the
allocation range needs to be 16M and why was 0 wrong?

> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 5cc60996eac56d6..6424ee4f23da2cf 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -441,7 +441,8 @@ static int __init reserve_crashkernel_low(void)
>  			return 0;
>  	}
>  
> -	low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX);
> +	low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN,
> +			CRASH_ADDR_LOW_MAX);

You don't have to break this line.

Thx.
Catalin Marinas Dec. 14, 2021, 7:24 p.m. UTC | #28
On Tue, Dec 14, 2021 at 08:07:58PM +0100, Borislav Petkov wrote:
> On Fri, Dec 10, 2021 at 02:55:25PM +0800, Zhen Lei wrote:
> > From: Chen Zhou <chenzhou10@huawei.com>
> > 
> > The lower bounds of crash kernel reservation and crash kernel low
> > reservation are different, use the consistent value CRASH_ALIGN.
> 
> A big WHY is missing here to explain why the lower bound of the
> allocation range needs to be 16M and why was 0 wrong?

I asked the same here:

https://lore.kernel.org/r/20210224143547.GB28965@arm.com

IIRC Baoquan said that there is a 1MB reserved for x86 anyway in the
lower part, so that's equivalent in practice to starting from
CRASH_ALIGN.

Anyway, I agree the commit log should describe this.
Leizhen (ThunderTown) Dec. 15, 2021, 2:10 a.m. UTC | #29
On 2021/12/15 3:24, Catalin Marinas wrote:
> On Tue, Dec 14, 2021 at 08:07:58PM +0100, Borislav Petkov wrote:
>> On Fri, Dec 10, 2021 at 02:55:25PM +0800, Zhen Lei wrote:
>>> From: Chen Zhou <chenzhou10@huawei.com>
>>>
>>> The lower bounds of crash kernel reservation and crash kernel low
>>> reservation are different, use the consistent value CRASH_ALIGN.
>>
>> A big WHY is missing here to explain why the lower bound of the
>> allocation range needs to be 16M and why was 0 wrong?
> 
> I asked the same here:
> 
> https://lore.kernel.org/r/20210224143547.GB28965@arm.com
> 
> IIRC Baoquan said that there is a 1MB reserved for x86 anyway in the
> lower part, so that's equivalent in practice to starting from
> CRASH_ALIGN.
> 
> Anyway, I agree the commit log should describe this.

OK, I will add the description.

>
Baoquan He Dec. 15, 2021, 3:42 a.m. UTC | #30
On 12/14/21 at 07:24pm, Catalin Marinas wrote:
> On Tue, Dec 14, 2021 at 08:07:58PM +0100, Borislav Petkov wrote:
> > On Fri, Dec 10, 2021 at 02:55:25PM +0800, Zhen Lei wrote:
> > > From: Chen Zhou <chenzhou10@huawei.com>
> > > 
> > > The lower bounds of crash kernel reservation and crash kernel low
> > > reservation are different, use the consistent value CRASH_ALIGN.
> > 
> > A big WHY is missing here to explain why the lower bound of the
> > allocation range needs to be 16M and why was 0 wrong?
> 
> I asked the same here:
> 
> https://lore.kernel.org/r/20210224143547.GB28965@arm.com
> 
> IIRC Baoquan said that there is a 1MB reserved for x86 anyway in the
> lower part, so that's equivalent in practice to starting from
> CRASH_ALIGN.

Yeah, even for i386, there's area reserved by BIOS inside low 1M.
Considering the existing alignment CRASH_ALIGN which is 16M, we
definitely have no chance to get memory starting from 0. So starting
from 16M can skip the useless memblock searching, and make the
crashkernel low reservation consisten with crashkernel reservation on
allocation code.

> 
> Anyway, I agree the commit log should describe this.

Yes, totally.
Baoquan He Dec. 15, 2021, 6:01 a.m. UTC | #31
On 12/14/21 at 03:24pm, Borislav Petkov wrote:
> On Tue, Dec 14, 2021 at 05:56:57PM +0800, Baoquan He wrote:
> > Ah, OK, I see the new paragraph from you added in below commit.
> 
> That is supposed to make it absolutely clear and explicit. There are
> other hints as to what a subsequent SOB means in that document already.
> 
> Just the next section says:
> 
> "The Signed-off-by: tag indicates that the signer was involved in the
> development of the patch, or that he/she was in the patch's delivery
> path."
> 
> It wouldn't hurt if people would look at that doc from time to time - we
> end up referring to it on a daily basis.


Thanks, I need read this completely, and often check if anything new
is added.
Leizhen (ThunderTown) Dec. 15, 2021, 8:56 a.m. UTC | #32
On 2021/12/14 19:40, Leizhen (ThunderTown) wrote:
> 
> 
> On 2021/12/10 14:55, Zhen Lei wrote:
>> From: Chen Zhou <chenzhou10@huawei.com>
>>
>> We will make the functions reserve_crashkernel() as generic, the
>> xen_pv_domain() check in reserve_crashkernel() is relevant only to
>> x86, the same as insert_resource() in reserve_crashkernel[_low]().
>> So move xen_pv_domain() check and insert_resource() to setup_arch()
>> to keep them in x86.
>>
>> Suggested-by: Mike Rapoport <rppt@kernel.org>
>> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>> Tested-by: John Donnelly <John.p.donnelly@oracle.com>
>> Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>
>> Acked-by: Baoquan He <bhe@redhat.com>
>> ---
>>  arch/x86/kernel/setup.c | 19 +++++++++++--------
>>  1 file changed, 11 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> index bb2a0973b98059e..7ae00716a208f82 100644
>> --- a/arch/x86/kernel/setup.c
>> +++ b/arch/x86/kernel/setup.c
>> @@ -456,7 +456,6 @@ static int __init reserve_crashkernel_low(void)
>>  
>>  	crashk_low_res.start = low_base;
>>  	crashk_low_res.end   = low_base + low_size - 1;
>> -	insert_resource(&iomem_resource, &crashk_low_res);
>>  #endif
>>  	return 0;
>>  }
>> @@ -480,11 +479,6 @@ static void __init reserve_crashkernel(void)
>>  		high = true;
>>  	}
>>  
>> -	if (xen_pv_domain()) {
>> -		pr_info("Ignoring crashkernel for a Xen PV domain\n");
>> -		return;
>> -	}
>> -
>>  	/* 0 means: find the address automatically */
>>  	if (!crash_base) {
>>  		/*
>> @@ -531,7 +525,6 @@ static void __init reserve_crashkernel(void)
>>  
>>  	crashk_res.start = crash_base;
>>  	crashk_res.end   = crash_base + crash_size - 1;
>> -	insert_resource(&iomem_resource, &crashk_res);
>>  }
>>  #else
>>  static void __init reserve_crashkernel(void)
>> @@ -1143,7 +1136,17 @@ void __init setup_arch(char **cmdline_p)
>>  	 * Reserve memory for crash kernel after SRAT is parsed so that it
>>  	 * won't consume hotpluggable memory.
>>  	 */
>> -	reserve_crashkernel();
> 
> Hi Baoquan:
>   How about move "#ifdef CONFIG_KEXEC_CORE" here, so that we can remove the
> empty reserve_crashkernel(). In fact, xen_pv_domain() is invoked only
> when CONFIG_KEXEC_CORE is enabled before.

Hi Baoquan:
  Did you miss this email? If no reply, I will keep it no change.

> 
>> +	if (xen_pv_domain())
>> +		pr_info("Ignoring crashkernel for a Xen PV domain\n");
>> +	else {
>> +		reserve_crashkernel();
>> +#ifdef CONFIG_KEXEC_CORE
>> +		if (crashk_res.end > crashk_res.start)
>> +			insert_resource(&iomem_resource, &crashk_res);
>> +		if (crashk_low_res.end > crashk_low_res.start)
>> +			insert_resource(&iomem_resource, &crashk_low_res);
>> +#endif
>> +	}
>>  
>>  	memblock_find_dma_reserve();
>>  
>>
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> .
>
Baoquan He Dec. 15, 2021, 9:22 a.m. UTC | #33
On 12/15/21 at 04:56pm, Leizhen (ThunderTown) wrote:
> 
> 
> On 2021/12/14 19:40, Leizhen (ThunderTown) wrote:
> > 
> > 
> > On 2021/12/10 14:55, Zhen Lei wrote:
> >> From: Chen Zhou <chenzhou10@huawei.com>
> >>
> >> We will make the functions reserve_crashkernel() as generic, the
> >> xen_pv_domain() check in reserve_crashkernel() is relevant only to
> >> x86, the same as insert_resource() in reserve_crashkernel[_low]().
> >> So move xen_pv_domain() check and insert_resource() to setup_arch()
> >> to keep them in x86.
> >>
> >> Suggested-by: Mike Rapoport <rppt@kernel.org>
> >> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
> >> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> >> Tested-by: John Donnelly <John.p.donnelly@oracle.com>
> >> Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>
> >> Acked-by: Baoquan He <bhe@redhat.com>
> >> ---
> >>  arch/x86/kernel/setup.c | 19 +++++++++++--------
> >>  1 file changed, 11 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> >> index bb2a0973b98059e..7ae00716a208f82 100644
> >> --- a/arch/x86/kernel/setup.c
> >> +++ b/arch/x86/kernel/setup.c
> >> @@ -456,7 +456,6 @@ static int __init reserve_crashkernel_low(void)
> >>  
> >>  	crashk_low_res.start = low_base;
> >>  	crashk_low_res.end   = low_base + low_size - 1;
> >> -	insert_resource(&iomem_resource, &crashk_low_res);
> >>  #endif
> >>  	return 0;
> >>  }
> >> @@ -480,11 +479,6 @@ static void __init reserve_crashkernel(void)
> >>  		high = true;
> >>  	}
> >>  
> >> -	if (xen_pv_domain()) {
> >> -		pr_info("Ignoring crashkernel for a Xen PV domain\n");
> >> -		return;
> >> -	}
> >> -
> >>  	/* 0 means: find the address automatically */
> >>  	if (!crash_base) {
> >>  		/*
> >> @@ -531,7 +525,6 @@ static void __init reserve_crashkernel(void)
> >>  
> >>  	crashk_res.start = crash_base;
> >>  	crashk_res.end   = crash_base + crash_size - 1;
> >> -	insert_resource(&iomem_resource, &crashk_res);
> >>  }
> >>  #else
> >>  static void __init reserve_crashkernel(void)
> >> @@ -1143,7 +1136,17 @@ void __init setup_arch(char **cmdline_p)
> >>  	 * Reserve memory for crash kernel after SRAT is parsed so that it
> >>  	 * won't consume hotpluggable memory.
> >>  	 */
> >> -	reserve_crashkernel();
> > 
> > Hi Baoquan:
> >   How about move "#ifdef CONFIG_KEXEC_CORE" here, so that we can remove the
> > empty reserve_crashkernel(). In fact, xen_pv_domain() is invoked only
> > when CONFIG_KEXEC_CORE is enabled before.
> 
> Hi Baoquan:
>   Did you miss this email? If no reply, I will keep it no change.

I checked this patch, and it's no update since I acked it. 

Moving reserve_crashkernel() into the CONFIG_KEXEC_CORE ifdeffery is
also fine to me.
Catalin Marinas Dec. 15, 2021, 11:01 a.m. UTC | #34
On Wed, Dec 15, 2021 at 11:42:19AM +0800, Baoquan He wrote:
> On 12/14/21 at 07:24pm, Catalin Marinas wrote:
> > On Tue, Dec 14, 2021 at 08:07:58PM +0100, Borislav Petkov wrote:
> > > On Fri, Dec 10, 2021 at 02:55:25PM +0800, Zhen Lei wrote:
> > > > From: Chen Zhou <chenzhou10@huawei.com>
> > > > 
> > > > The lower bounds of crash kernel reservation and crash kernel low
> > > > reservation are different, use the consistent value CRASH_ALIGN.
> > > 
> > > A big WHY is missing here to explain why the lower bound of the
> > > allocation range needs to be 16M and why was 0 wrong?
> > 
> > I asked the same here:
> > 
> > https://lore.kernel.org/r/20210224143547.GB28965@arm.com
> > 
> > IIRC Baoquan said that there is a 1MB reserved for x86 anyway in the
> > lower part, so that's equivalent in practice to starting from
> > CRASH_ALIGN.
> 
> Yeah, even for i386, there's area reserved by BIOS inside low 1M.
> Considering the existing alignment CRASH_ALIGN which is 16M, we
> definitely have no chance to get memory starting from 0. So starting
> from 16M can skip the useless memblock searching, and make the
> crashkernel low reservation consisten with crashkernel reservation on
> allocation code.

That's the x86 assumption. Is it valid for other architectures once the
code has been made generic in patch 6? It should be ok for arm64, RAM
tends to start from higher up but other architectures may start using
this common code.

If you want to keep the same semantics as before, just leave it as 0.
It's not that the additional lower bound makes the search slower.
Baoquan He Dec. 15, 2021, 11:16 a.m. UTC | #35
On 12/15/21 at 11:01am, Catalin Marinas wrote:
> On Wed, Dec 15, 2021 at 11:42:19AM +0800, Baoquan He wrote:
> > On 12/14/21 at 07:24pm, Catalin Marinas wrote:
> > > On Tue, Dec 14, 2021 at 08:07:58PM +0100, Borislav Petkov wrote:
> > > > On Fri, Dec 10, 2021 at 02:55:25PM +0800, Zhen Lei wrote:
> > > > > From: Chen Zhou <chenzhou10@huawei.com>
> > > > > 
> > > > > The lower bounds of crash kernel reservation and crash kernel low
> > > > > reservation are different, use the consistent value CRASH_ALIGN.
> > > > 
> > > > A big WHY is missing here to explain why the lower bound of the
> > > > allocation range needs to be 16M and why was 0 wrong?
> > > 
> > > I asked the same here:
> > > 
> > > https://lore.kernel.org/r/20210224143547.GB28965@arm.com
> > > 
> > > IIRC Baoquan said that there is a 1MB reserved for x86 anyway in the
> > > lower part, so that's equivalent in practice to starting from
> > > CRASH_ALIGN.
> > 
> > Yeah, even for i386, there's area reserved by BIOS inside low 1M.
> > Considering the existing alignment CRASH_ALIGN which is 16M, we
> > definitely have no chance to get memory starting from 0. So starting
> > from 16M can skip the useless memblock searching, and make the
> > crashkernel low reservation consisten with crashkernel reservation on
> > allocation code.
> 
> That's the x86 assumption. Is it valid for other architectures once the
> code has been made generic in patch 6? It should be ok for arm64, RAM
> tends to start from higher up but other architectures may start using
> this common code.

Good point. I didn't think of this from generic code side, then let's
keep it as 0.

> 
> If you want to keep the same semantics as before, just leave it as 0.
> It's not that the additional lower bound makes the search slower.

Agree.
Leizhen (ThunderTown) Dec. 15, 2021, 11:45 a.m. UTC | #36
On 2021/12/15 19:16, Baoquan He wrote:
> On 12/15/21 at 11:01am, Catalin Marinas wrote:
>> On Wed, Dec 15, 2021 at 11:42:19AM +0800, Baoquan He wrote:
>>> On 12/14/21 at 07:24pm, Catalin Marinas wrote:
>>>> On Tue, Dec 14, 2021 at 08:07:58PM +0100, Borislav Petkov wrote:
>>>>> On Fri, Dec 10, 2021 at 02:55:25PM +0800, Zhen Lei wrote:
>>>>>> From: Chen Zhou <chenzhou10@huawei.com>
>>>>>>
>>>>>> The lower bounds of crash kernel reservation and crash kernel low
>>>>>> reservation are different, use the consistent value CRASH_ALIGN.
>>>>>
>>>>> A big WHY is missing here to explain why the lower bound of the
>>>>> allocation range needs to be 16M and why was 0 wrong?
>>>>
>>>> I asked the same here:
>>>>
>>>> https://lore.kernel.org/r/20210224143547.GB28965@arm.com
>>>>
>>>> IIRC Baoquan said that there is a 1MB reserved for x86 anyway in the
>>>> lower part, so that's equivalent in practice to starting from
>>>> CRASH_ALIGN.
>>>
>>> Yeah, even for i386, there's area reserved by BIOS inside low 1M.
>>> Considering the existing alignment CRASH_ALIGN which is 16M, we
>>> definitely have no chance to get memory starting from 0. So starting
>>> from 16M can skip the useless memblock searching, and make the
>>> crashkernel low reservation consisten with crashkernel reservation on
>>> allocation code.
>>
>> That's the x86 assumption. Is it valid for other architectures once the
>> code has been made generic in patch 6? It should be ok for arm64, RAM
>> tends to start from higher up but other architectures may start using
>> this common code.
> 
> Good point. I didn't think of this from generic code side, then let's
> keep it as 0.
> 
>>
>> If you want to keep the same semantics as before, just leave it as 0.
>> It's not that the additional lower bound makes the search slower.
> 
> Agree.

OK, I will drop this patch.

> 
> .
>
Borislav Petkov Dec. 15, 2021, 1:28 p.m. UTC | #37
On Fri, Dec 10, 2021 at 02:55:26PM +0800, Zhen Lei wrote:
> @@ -518,7 +519,7 @@ static void __init reserve_crashkernel(void)
>  		}
>  	}
>  
> -	if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
> +	if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
>  		memblock_phys_free(crash_base, crash_size);
>  		return;
>  	}

That's not a equivalent transformation on X86_32.
Baoquan He Dec. 16, 2021, 1:10 a.m. UTC | #38
On 12/15/21 at 02:28pm, Borislav Petkov wrote:
> On Fri, Dec 10, 2021 at 02:55:26PM +0800, Zhen Lei wrote:
> > @@ -518,7 +519,7 @@ static void __init reserve_crashkernel(void)
> >  		}
> >  	}
> >  
> > -	if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
> > +	if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
> >  		memblock_phys_free(crash_base, crash_size);
> >  		return;
> >  	}
> 
> That's not a equivalent transformation on X86_32.

reserve_crashkernel_low() always return 0 on x86_32, so the not equivalent
transformation for x86_32 doesn't matter, I think.
Leizhen (ThunderTown) Dec. 16, 2021, 2:46 a.m. UTC | #39
On 2021/12/16 9:10, Baoquan He wrote:
> On 12/15/21 at 02:28pm, Borislav Petkov wrote:
>> On Fri, Dec 10, 2021 at 02:55:26PM +0800, Zhen Lei wrote:
>>> @@ -518,7 +519,7 @@ static void __init reserve_crashkernel(void)
>>>  		}
>>>  	}
>>>  
>>> -	if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
>>> +	if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
>>>  		memblock_phys_free(crash_base, crash_size);
>>>  		return;
>>>  	}
>>
>> That's not a equivalent transformation on X86_32.

The original value (1ULL << 32) is inaccurate, and it enlarged the CRASH_ADDR_LOW
upper limit. This is because when the memory is allocated from the low end,
the address cannot exceed CRASH_ADDR_LOW_MAX, see "if (!high)" branch. If
the memory is allocated from the high end, 'crash_base' is greater than or
equal to (1ULL << 32), and naturally, it is greater than CRASH_ADDR_LOW_MAX.

I think I should update the description, thanks.

                if (!high)
                        crash_base = memblock_phys_alloc_range(crash_size,
                                                CRASH_ALIGN, CRASH_ALIGN,
                                                CRASH_ADDR_LOW_MAX);
                if (!crash_base)
                        crash_base = memblock_phys_alloc_range(crash_size,
                                                CRASH_ALIGN, CRASH_ALIGN,
                                                CRASH_ADDR_HIGH_MAX);

> 
> reserve_crashkernel_low() always return 0 on x86_32, so the not equivalent
> transformation for x86_32 doesn't matter, I think.
> 
> .
>
Borislav Petkov Dec. 16, 2021, 10:55 a.m. UTC | #40
On Thu, Dec 16, 2021 at 09:10:40AM +0800, Baoquan He wrote:
> reserve_crashkernel_low() always return 0 on x86_32, so the not equivalent
> transformation for x86_32 doesn't matter, I think.

That is, of course, very obvious... not!

Why is that function parsing crashkernel=XM, and crashkernel=X,high,
then it attempts some low memory reservation too? Why isn't
crashkernel=Y,low parsed there too?

I guess this alludes to why:

        crashkernel=size[KMG],low
                        [KNL, X86-64] range under 4G. When crashkernel=X,high
                        is passed, kernel could allocate physical memory region
                        above 4G, that cause second kernel crash on system
                        that require some amount of low memory, e.g. swiotlb
                        requires at least 64M+32K low memory, also enough extra
                        low memory is needed to make sure DMA buffers for 32-bit
                        devices won't run out.

So, before this is going anywhere, I'd like to see this function
documented properly. I see Documentation/admin-guide/kdump/kdump.rst
explains the crashkernel= options too so you can refer to it in the
comments so that when someone looks at that code, someone can follow why
it is doing what it is doing.

Then, as a future work, all that parsing of crashkernel= cmdline options
should be concentrated at the beginning and once it is clear what the
user requests, the reservations should be done.

As it is, reserve_crashkernel() is pretty unwieldy and hard to read.

Thx.
Borislav Petkov Dec. 16, 2021, 11:07 a.m. UTC | #41
On Thu, Dec 16, 2021 at 10:46:12AM +0800, Leizhen (ThunderTown) wrote:
> The original value (1ULL << 32) is inaccurate

I keep asking *why*?

> and it enlarged the CRASH_ADDR_LOW upper limit.

$ git grep -E "CRASH_ADDR_LOW\W"
$

I have no clue what you mean here.

> This is because when the memory is allocated from the low end, the
> address cannot exceed CRASH_ADDR_LOW_MAX, see "if (!high)" branch.

> If
> the memory is allocated from the high end, 'crash_base' is greater than or
> equal to (1ULL << 32), and naturally, it is greater than CRASH_ADDR_LOW_MAX.
> 
> I think I should update the description, thanks.

I think you should explain why is (1ULL << 32) wrong.

It came from:

  eb6db83d1059 ("x86/setup: Do not reserve crashkernel high memory if low reservation failed")

which simply frees the high memory portion when the low reservation
fails. And the test for that is, is crash base > 4G. So that makes
perfect sense to me.

So your change is a NOP on 64-bit and it is a NOP on 32-bit by virtue of
the _low() variant always returning 0 on 32-bit.
Borislav Petkov Dec. 16, 2021, 11:17 a.m. UTC | #42
On Fri, Dec 10, 2021 at 02:55:28PM +0800, Zhen Lei wrote:
> + * reserve_crashkernel() - reserves memory for crash kernel
> + *
> + * This function reserves memory area given in "crashkernel=" kernel command
> + * line parameter. The memory reserved is used by dump capture kernel when
> + * primary kernel is crashing.
> + */
> +void __init reserve_crashkernel(void)

As I've already alluded to in another mail, ontop of this there should
be a patch or multiple patches which clean this up more and perhaps even
split it into separate functions doing stuff in this order:

1. Parse all crashkernel= cmdline options

2. Do all crash_base, crash_size etc checks

3. Do the memory reservations

And all that supplied with comments explaining why stuff is being done.

This set of functions is a mess and there's no better time for cleaning
it up and documenting it properly than when you move it to generic code.

Thx.
Leizhen (ThunderTown) Dec. 16, 2021, 12:08 p.m. UTC | #43
On 2021/12/16 19:07, Borislav Petkov wrote:
> On Thu, Dec 16, 2021 at 10:46:12AM +0800, Leizhen (ThunderTown) wrote:
>> The original value (1ULL << 32) is inaccurate
> 
> I keep asking *why*?
> 
>> and it enlarged the CRASH_ADDR_LOW upper limit.
> 
> $ git grep -E "CRASH_ADDR_LOW\W"
> $
> 
> I have no clue what you mean here.

#ifdef CONFIG_X86_32
# define CRASH_ADDR_LOW_MAX     SZ_512M
# define CRASH_ADDR_HIGH_MAX    SZ_512M
#endif

		if (!high)
(1)                     crash_base = memblock_phys_alloc_range(crash_size,
                                                CRASH_ALIGN, CRASH_ALIGN,
                                                CRASH_ADDR_LOW_MAX);
                if (!crash_base)
(2)                     crash_base = memblock_phys_alloc_range(crash_size,
                                                CRASH_ALIGN, CRASH_ALIGN,
                                                CRASH_ADDR_HIGH_MAX);

-	if (crash_base >= (1ULL << 32) && reserve_crashkernel_low())
+(3)	if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low())

If the memory of 'crash_base' is successfully allocated at (1), because the last
parameter CRASH_ADDR_LOW_MAX is the upper bound, so we can sure that
"crash_base < CRASH_ADDR_LOW_MAX". So that, reserve_crashkernel_low() will not be
invoked at (3). That's why I said (1ULL << 32) is inaccurate and enlarge the CRASH_ADDR_LOW
upper limit.

If the memory of 'crash_base' is successfully allocated at (2), you see,
CRASH_ADDR_HIGH_MAX = CRASH_ADDR_LOW_MAX = SZ_512M, the same as (1). In fact,
"crashkernel=high," may not be recommended on X86_32.

Is it possible that (CRASH_ADDR_HIGH_MAX >= 4G) and (CRASH_ADDR_LOW_MAX < 4G)?
In this case, the memory allocated at (2) maybe over 4G. But why shouldn't
CRASH_ADDR_LOW_MAX be equal to 4G at this point?


> 
>> This is because when the memory is allocated from the low end, the
>> address cannot exceed CRASH_ADDR_LOW_MAX, see "if (!high)" branch.
> 
>> If
>> the memory is allocated from the high end, 'crash_base' is greater than or
>> equal to (1ULL << 32), and naturally, it is greater than CRASH_ADDR_LOW_MAX.
>>
>> I think I should update the description, thanks.
> 
> I think you should explain why is (1ULL << 32) wrong.
> 
> It came from:
> 
>   eb6db83d1059 ("x86/setup: Do not reserve crashkernel high memory if low reservation failed")
> 
> which simply frees the high memory portion when the low reservation
> fails. And the test for that is, is crash base > 4G. So that makes
> perfect sense to me.
> 
> So your change is a NOP on 64-bit and it is a NOP on 32-bit by virtue of
> the _low() variant always returning 0 on 32-bit.
>
Leizhen (ThunderTown) Dec. 16, 2021, 12:23 p.m. UTC | #44
On 2021/12/16 20:08, Leizhen (ThunderTown) wrote:
> 
> 
> On 2021/12/16 19:07, Borislav Petkov wrote:
>> On Thu, Dec 16, 2021 at 10:46:12AM +0800, Leizhen (ThunderTown) wrote:
>>> The original value (1ULL << 32) is inaccurate
>>
>> I keep asking *why*?
>>
>>> and it enlarged the CRASH_ADDR_LOW upper limit.
>>
>> $ git grep -E "CRASH_ADDR_LOW\W"
>> $
>>
>> I have no clue what you mean here.
> 
> #ifdef CONFIG_X86_32
> # define CRASH_ADDR_LOW_MAX     SZ_512M
> # define CRASH_ADDR_HIGH_MAX    SZ_512M
> #endif
> 
> 		if (!high)
> (1)                     crash_base = memblock_phys_alloc_range(crash_size,
>                                                 CRASH_ALIGN, CRASH_ALIGN,
>                                                 CRASH_ADDR_LOW_MAX);
>                 if (!crash_base)
> (2)                     crash_base = memblock_phys_alloc_range(crash_size,
>                                                 CRASH_ALIGN, CRASH_ALIGN,
>                                                 CRASH_ADDR_HIGH_MAX);
> 
> -	if (crash_base >= (1ULL << 32) && reserve_crashkernel_low())
> +(3)	if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low())
> 
> If the memory of 'crash_base' is successfully allocated at (1), because the last
> parameter CRASH_ADDR_LOW_MAX is the upper bound, so we can sure that
> "crash_base < CRASH_ADDR_LOW_MAX". So that, reserve_crashkernel_low() will not be
> invoked at (3). That's why I said (1ULL << 32) is inaccurate and enlarge the CRASH_ADDR_LOW
> upper limit.
> 
> If the memory of 'crash_base' is successfully allocated at (2), you see,
> CRASH_ADDR_HIGH_MAX = CRASH_ADDR_LOW_MAX = SZ_512M, the same as (1). In fact,
> "crashkernel=high," may not be recommended on X86_32.
> 
> Is it possible that (CRASH_ADDR_HIGH_MAX >= 4G) and (CRASH_ADDR_LOW_MAX < 4G)?
> In this case, the memory allocated at (2) maybe over 4G. But why shouldn't
> CRASH_ADDR_LOW_MAX be equal to 4G at this point?

We divide two memory areas: low memory area and high memory area. The doc told us:
at least 256MB memory should be reserved at low memory area. So that if
"crash_base >= CRASH_ADDR_LOW_MAX" is true at (3), that means we have not reserved
any memory at low memory area, so we should call reserve_crashkernel_low().
The low memory area is not equivalent to <=4G, I think. So replace (1ULL << 32) with
CRASH_ADDR_LOW_MAX is logically correct.

> 
> 
>>
>>> This is because when the memory is allocated from the low end, the
>>> address cannot exceed CRASH_ADDR_LOW_MAX, see "if (!high)" branch.
>>
>>> If
>>> the memory is allocated from the high end, 'crash_base' is greater than or
>>> equal to (1ULL << 32), and naturally, it is greater than CRASH_ADDR_LOW_MAX.
>>>
>>> I think I should update the description, thanks.
>>
>> I think you should explain why is (1ULL << 32) wrong.
>>
>> It came from:
>>
>>   eb6db83d1059 ("x86/setup: Do not reserve crashkernel high memory if low reservation failed")
>>
>> which simply frees the high memory portion when the low reservation
>> fails. And the test for that is, is crash base > 4G. So that makes
>> perfect sense to me.
>>
>> So your change is a NOP on 64-bit and it is a NOP on 32-bit by virtue of
>> the _low() variant always returning 0 on 32-bit.
>>
Leizhen (ThunderTown) Dec. 16, 2021, 1:15 p.m. UTC | #45
On 2021/12/16 19:17, Borislav Petkov wrote:
> On Fri, Dec 10, 2021 at 02:55:28PM +0800, Zhen Lei wrote:
>> + * reserve_crashkernel() - reserves memory for crash kernel
>> + *
>> + * This function reserves memory area given in "crashkernel=" kernel command
>> + * line parameter. The memory reserved is used by dump capture kernel when
>> + * primary kernel is crashing.
>> + */
>> +void __init reserve_crashkernel(void)
> 
> As I've already alluded to in another mail, ontop of this there should
> be a patch or multiple patches which clean this up more and perhaps even
> split it into separate functions doing stuff in this order:
> 
> 1. Parse all crashkernel= cmdline options
> 
> 2. Do all crash_base, crash_size etc checks
> 
> 3. Do the memory reservations
> 
> And all that supplied with comments explaining why stuff is being done.

I agree with you. This makes the code look clear. I will do it, try to
post v18 next Monday.

> 
> This set of functions is a mess and there's no better time for cleaning
> it up and documenting it properly than when you move it to generic code.
> 
> Thx.
>
Baoquan He Dec. 16, 2021, 2:11 p.m. UTC | #46
On 12/16/21 at 11:55am, Borislav Petkov wrote:
> On Thu, Dec 16, 2021 at 09:10:40AM +0800, Baoquan He wrote:
> > reserve_crashkernel_low() always return 0 on x86_32, so the not equivalent
> > transformation for x86_32 doesn't matter, I think.
> 
> That is, of course, very obvious... not!
> 
> Why is that function parsing crashkernel=XM, and crashkernel=X,high,
> then it attempts some low memory reservation too? Why isn't
> crashkernel=Y,low parsed there too?
> 
> I guess this alludes to why:
> 
>         crashkernel=size[KMG],low
>                         [KNL, X86-64] range under 4G. When crashkernel=X,high
>                         is passed, kernel could allocate physical memory region
>                         above 4G, that cause second kernel crash on system
>                         that require some amount of low memory, e.g. swiotlb
>                         requires at least 64M+32K low memory, also enough extra
>                         low memory is needed to make sure DMA buffers for 32-bit
>                         devices won't run out.
> 
> So, before this is going anywhere, I'd like to see this function
> documented properly. I see Documentation/admin-guide/kdump/kdump.rst
> explains the crashkernel= options too so you can refer to it in the
> comments so that when someone looks at that code, someone can follow why
> it is doing what it is doing.
> 
> Then, as a future work, all that parsing of crashkernel= cmdline options
> should be concentrated at the beginning and once it is clear what the
> user requests, the reservations should be done.

Totally agree we should refactor code to make reserve_crashkernel()
clearer on logic and readibility. In this patchset, we can rewrite the
kernel-doc of reserve_crashkernel() to add more words to explain. As for
the code refactoring, it can be done in another patchset.

> 
> As it is, reserve_crashkernel() is pretty unwieldy and hard to read.
Borislav Petkov Dec. 16, 2021, 2:48 p.m. UTC | #47
On Thu, Dec 16, 2021 at 08:08:30PM +0800, Leizhen (ThunderTown) wrote:
> If the memory of 'crash_base' is successfully allocated at (1), because the last
> parameter CRASH_ADDR_LOW_MAX is the upper bound, so we can sure that
> "crash_base < CRASH_ADDR_LOW_MAX". So that, reserve_crashkernel_low() will not be
> invoked at (3). That's why I said (1ULL << 32) is inaccurate and enlarge the CRASH_ADDR_LOW
> upper limit.

No, this is actually wrong - that check *must* be 4G. See:

  eb6db83d1059 ("x86/setup: Do not reserve crashkernel high memory if low reservation failed")

It is even documented:

        crashkernel=size[KMG],low
                        [KNL, X86-64] range under 4G. When crashkernel=X,high
                        is passed, kernel could allocate physical memory region
                        above 4G, that cause second kernel crash on system
                        that require some amount of low memory, e.g. swiotlb
                        requires at least 64M+32K low memory, also enough extra
                        low memory is needed to make sure DMA buffers for 32-bit
                        devices won't run out.

so you need to do a low allocation for DMA *when* the reserved memory is
above 4G. *NOT* above 512M. But that works due to the obscure situation,
as Baoquan stated, that reserve_crashkernel_low() returns 0 on 32-bit.

So all this is telling us is that that function needs serious cleanup.
Borislav Petkov Dec. 16, 2021, 2:51 p.m. UTC | #48
On Thu, Dec 16, 2021 at 09:15:18PM +0800, Leizhen (ThunderTown) wrote:
> I agree with you. This makes the code look clear. I will do it, try to
> post v18 next Monday.

Don't rush it: take your time, do it nice and clean, make sure each
patch does one logical thing only so that there are no bugs introduced
and the pile can is reviewable.

Thx.
Borislav Petkov Dec. 16, 2021, 2:58 p.m. UTC | #49
On Thu, Dec 16, 2021 at 10:11:15PM +0800, Baoquan He wrote:
>  As for the code refactoring, it can be done in another patchset.

Well, I said "future work" before having seen where this patchset is
going. So no, in that case, the usual kernel development process is:
cleanups/fixes first - new features later.

You can see for yourself that piling more crap on what is an already
unreadable mess is only going to make it worse. So, in order to avoid
the maintenance nightmare, we clean up, streamline, document and then
add the new feature and all is shiny.

Thx.
Leizhen (ThunderTown) Dec. 17, 2021, 2:51 a.m. UTC | #50
On 2021/12/16 22:48, Borislav Petkov wrote:
> On Thu, Dec 16, 2021 at 08:08:30PM +0800, Leizhen (ThunderTown) wrote:
>> If the memory of 'crash_base' is successfully allocated at (1), because the last
>> parameter CRASH_ADDR_LOW_MAX is the upper bound, so we can sure that
>> "crash_base < CRASH_ADDR_LOW_MAX". So that, reserve_crashkernel_low() will not be
>> invoked at (3). That's why I said (1ULL << 32) is inaccurate and enlarge the CRASH_ADDR_LOW
>> upper limit.
> 
> No, this is actually wrong - that check *must* be 4G. See:
> 
>   eb6db83d1059 ("x86/setup: Do not reserve crashkernel high memory if low reservation failed")
> 
> It is even documented:
> 
>         crashkernel=size[KMG],low
>                         [KNL, X86-64] range under 4G. When crashkernel=X,high

[KNL, X86-64], This doc is for X86-64, not for X86-32

>                         is passed, kernel could allocate physical memory region
>                         above 4G, that cause second kernel crash on system
>                         that require some amount of low memory, e.g. swiotlb
>                         requires at least 64M+32K low memory, also enough extra
>                         low memory is needed to make sure DMA buffers for 32-bit
>                         devices won't run out.

vi arch/x86/kernel/setup.c +398

/*
 * Keep the crash kernel below this limit.
 *
 * Earlier 32-bits kernels would limit the kernel to the low 512 MB range
 * due to mapping restrictions.

If there is no such restriction, we can make CRASH_ADDR_LOW_MAX equal to (1ULL << 32) minus 1 on X86_32.

> 
> so you need to do a low allocation for DMA *when* the reserved memory is
> above 4G. *NOT* above 512M. But that works due to the obscure situation,
> as Baoquan stated, that reserve_crashkernel_low() returns 0 on 32-bit.
> 
> So all this is telling us is that that function needs serious cleanup.
>
Borislav Petkov Dec. 21, 2021, 10:23 p.m. UTC | #51
On Fri, Dec 17, 2021 at 10:51:04AM +0800, Leizhen (ThunderTown) wrote:
> [KNL, X86-64], This doc is for X86-64, not for X86-32

reserve_crashkernel() runs on both.

> If there is no such restriction, we can make CRASH_ADDR_LOW_MAX equal
> to (1ULL << 32) minus 1 on X86_32.

Again, the 4G limit check is relevant only for 64-bit kernels - not
32-bit ones.