mbox series

[v22,0/9] support reserving crashkernel above 4G on arm64 kdump

Message ID 20220414115720.1887-1-thunder.leizhen@huawei.com
Headers show
Series support reserving crashkernel above 4G on arm64 kdump | expand

Message

Leizhen (ThunderTown) April 14, 2022, 11:57 a.m. UTC
Changes since [v21]:
1. Update the commit message of  patch 1 and 5.
2. Add some comments for reserve_crashkernel() in patch 5.

Thanks to Baoquan He and John Donnelly for their review comments.

Because v5.18-rc1 has added a new patch
commit  031495635b46 ("arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones")
There are many new scenarios:
1) The mappings may be block or page-level. 
2) The call to reserve_crashkernel() may or may not be deferred.
3) The the upper limit of DMA address may be 4G, or less than 4G. Or the
   upper limit of physical memory, because SMMU can do the mapping.

The code of patch 1-2, 8-9 keep no change, because the above-mentioned issues are not involved.
The code of patch 5 only makes the following changes:
-	if (crash_base >= SZ_4G)
+	/*
+	 * When both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32 are disabled, the
+	 * CRASH_ADDR_LOW_MAX equals the upper limit of physical memory, so
+	 * the 'crash_base' of high memory can not exceed it. To follow the
+	 * description of "crashkernel=X,high" option, add below 'high'
+	 * condition to make sure the crash low memory will be reserved.
+	 */
+	if ((crash_base >= CRASH_ADDR_LOW_MAX) || high) {
Change SZ_4G to CRASH_ADDR_LOW_MAX, because arm64_dma_phys_limit may be less than
4G or greater than 4G. The check 'high' is used for "crashkernel=X,high" and
"(crash_base >= CRASH_ADDR_LOW_MAX)" is used for "crashkernel=X[@offset]".

Patch 3-4 to allow block mappings for memory above 4G.
Patch 6-7 to support only crash high memory or fixed memory range specified by
crashkernel=X@offset use page-level mapping, to allow other areas use block mapping.
These four patches are for performance optimization purposes. For details about the
technical feasibility analysis, please see the commit messages.

Now the implementation of arm64 is very different from that of x86. It's no longer
suitable for both of them to share code.



Changes since [v20]:
1. Check whether crashkernel=Y,low is incorrectly configured or not configured. Do different processing.
2. Share the existing description of x86. The configuration of arm64 is the same as that of x86.
3. Define the value of macro CRASH_ADDR_HIGH_MAX as memblock.current_limit, instead of MEMBLOCK_ALLOC_ACCESSIBLE.
4. To improve readability, some lightweight code adjustments have been made to reserve_craskernel(), including comments.
5. The defined value of DEFAULT_CRASH_KERNEL_LOW_SIZE reconsiders swiotlb, just like x86, to share documents.

Thanks to Baoquan He for his careful review.

The test cases are as follows: (Please update the kexec tool to the latest version)
1) crashkernel=4G						//high=4G, low=256M
2) crashkernel=4G crashkernel=512M,high crashkernel=512M,low	//high=4G, low=256M, high and low are ignored
3) crashkernel=4G crashkernel=512M,high				//high=4G, low=256M, high is ignored
4) crashkernel=4G crashkernel=512M,low				//high=4G, low=256M, low is ignored
5) crashkernel=4G@0xe0000000					//high=0G, low=0M, cannot allocate, failed
6) crashkernel=512M						//high=0G, low=512M
7) crashkernel=128M						//high=0G, low=128M
8) crashkernel=512M@0xde000000		//512M@3552M		//high=0G, low=512M
9) crashkernel=4G,high						//high=4G, low=256M
a) crashkernel=4G,high crashkernel=512M,low			//high=4G, low=512M
b) crashkernel=512M,high crashkernel=128M,low			//high=512M, low=128M
c) crashkernel=128M,high					//high=128M, low=256M
d) crashkernel=512M,low						//high=0G, low=0M, invalid
e) crashkernel=512M,high crashkernel=0,low			//high=512M, low=0M
f) crashkernel=4G,high crashkernel=ab,low			//high=0G, low=0M, invalid


Changes since [v19]:
1. Temporarily stop making reserve_crashkernel[_low]() generic. There are a
   lot of details need to be considered, which can take a long time. Because
   "make generic" does not add new functions and does not improve performance,
   maybe I should say it's just a cleanup. So by stripping it out and leaving
   it for other patches later, we can aggregate the changes to the main functions.
2. Use insert_resource() to replace request_resource(), this not only simplifies
   the code, but also reduces the differences between arm64 and x86 implementations.
3. As commit 157752d84f5d ("kexec: use Crash kernel for Crash kernel low") do for
   x86, we can also extend kexec-tools for arm64, and it's currently applied. See:
   https://www.spinics.net/lists/kexec/msg28284.html

Thank you very much, Borislav Petkov, for so many valuable comments.

Changes since [v17]: v17 --> v19
1. Patch 0001-0004
   Introduce generic parse_crashkernel_high_low() to bring the parsing of
   "crashkernel=X,high" and the parsing of "crashkernel=X,low" together,
   then use it instead of the call to parse_crashkernel_{high|low}(). Two
   confusing parameters of parse_crashkernel_{high|low}() are deleted.

   I previously sent these four patches separately:
   [1] https://lkml.org/lkml/2021/12/25/40
2. Patch 0005-0009
   Introduce generic reserve_crashkernel_mem[_low](), the implementation of
   these two functions is based on function reserve_crashkernel[_low]() in
   arch/x86/kernel/setup.c. There is no functional change for x86.
   1) The check position of xen_pv_domain() does not change.
   2) Still 1M alignment for crash kernel fixed region, when 'base' is specified.

   To avoid compilation problems on other architectures: patch 0004 moves
   the definition of global variable crashk[_low]_res from kexec_core.c to
   crash_core.c, and provide default definitions for all macros involved, a
   particular platform can redefine these macros to override the default
   values.
3. 0010, only one line of comment was changed.
4. 0011
   1) crashk_low_res may also a valid reserved memory, should be checked
      in crash_is_nosave(), see arch/arm64/kernel/machine_kexec.
   2) Drop memblock_mark_nomap() for crashk_low_res, because of:
      2687275a5843 arm64: Force NO_BLOCK_MAPPINGS if crashkernel reservation is required
   3) Also call kmemleak_ignore_phys() for crashk_low_res, because of:
      85f58eb18898 arm64: kdump: Skip kmemleak scan reserved memory for kdump
5. 0012, slightly rebased, because the following patch is applied in advance. 
   https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git/commit/?h=dt/linus&id=8347b41748c3019157312fbe7f8a6792ae396eb7
6. 0013, no change.

Others:
1. Discard add ARCH_WANT_RESERVE_CRASH_KERNEL
2. When allocating crash low memory, the start address still starts from 0.
   low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX);
3. Discard change (1ULL << 32) to CRASH_ADDR_LOW_MAX.
4. Ensure the check position of xen_pv_domain() have no change.
5. Except patch 0010 and 0012, all "Tested-by", "Reviewed-by", "Acked-by" are removed.
6. Update description.



Changes since [v16]
- Because no functional changes in this version, so add
  "Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>" for patch 1-9
- Add "Reviewed-by: Rob Herring <robh@kernel.org>" for patch 8
- Update patch 9 based on the review comments of Rob Herring
- As Catalin Marinas's suggestion, merge the implementation of
  ARCH_WANT_RESERVE_CRASH_KERNEL into patch 5. Ensure that the
  contents of X86 and ARM64 do not overlap, and reduce unnecessary
  temporary differences.

Changes since [v15]
-  Aggregate the processing of "linux,usable-memory-range" into one function.
   Only patch 9-10 have been updated.

Changes since [v14]
- Recovering the requirement that the CrashKernel memory regions on X86
  only requires 1 MiB alignment.
- Combine patches 5 and 6 in v14 into one. The compilation warning fixed
  by patch 6 was introduced by patch 5 in v14.
- As with crashk_res, crashk_low_res is also processed by
  crash_exclude_mem_range() in patch 7.
- Due to commit b261dba2fdb2 ("arm64: kdump: Remove custom linux,usable-memory-range handling")
  has removed the architecture-specific code, extend the property "linux,usable-memory-range"
  in the platform-agnostic FDT core code. See patch 9.
- Discard the x86 description update in the document, because the description
  has been updated by commit b1f4c363666c ("Documentation: kdump: update kdump guide").
- Change "arm64" to "ARM64" in Doc.


Changes since [v13]
- Rebased on top of 5.11-rc5.
- Introduce config CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL.
Since reserve_crashkernel[_low]() implementations are quite similar on
other architectures, so have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in
arch/Kconfig and select this by X86 and ARM64.
- Some minor cleanup.

Changes since [v12]
- Rebased on top of 5.10-rc1.
- Keep CRASH_ALIGN as 16M suggested by Dave.
- Drop patch "kdump: add threshold for the required memory".
- Add Tested-by from John.

Changes since [v11]
- Rebased on top of 5.9-rc4.
- Make the function reserve_crashkernel() of x86 generic.
Suggested by Catalin, make the function reserve_crashkernel() of x86 generic
and arm64 use the generic version to reimplement crashkernel=X.

Changes since [v10]
- Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.

Changes since [v9]
- Patch 1 add Acked-by from Dave.
- Update patch 5 according to Dave's comments.
- Update chosen schema.

Changes since [v8]
- Reuse DT property "linux,usable-memory-range".
Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low
memory region.
- Fix kdump broken with ZONE_DMA reintroduced.
- Update chosen schema.

Changes since [v7]
- Move x86 CRASH_ALIGN to 2M
Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
- Update Documentation/devicetree/bindings/chosen.txt.
Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt
suggested by Arnd.
- Add Tested-by from Jhon and pk.

Changes since [v6]
- Fix build errors reported by kbuild test robot.

Changes since [v5]
- Move reserve_crashkernel_low() into kernel/crash_core.c.
- Delete crashkernel=X,high.
- Modify crashkernel=X,low.
If crashkernel=X,low is specified simultaneously, reserve spcified size low
memory for crash kdump kernel devices firstly and then reserve memory above 4G.
In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
pass to crash dump kernel by DT property "linux,low-memory-range".
- Update Documentation/admin-guide/kdump/kdump.rst.

Changes since [v4]
- Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.

Changes since [v3]
- Add memblock_cap_memory_ranges back for multiple ranges.
- Fix some compiling warnings.

Changes since [v2]
- Split patch "arm64: kdump: support reserving crashkernel above 4G" as
two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
patch.

Changes since [v1]:
- Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
- Remove memblock_cap_memory_ranges() i added in v1 and implement that
in fdt_enforce_memory_region().
There are at most two crash kernel regions, for two crash kernel regions
case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
and then remove the memory range in the middle.

v1:
There are following issues in arm64 kdump:
1. We use crashkernel=X to reserve crashkernel below 4G, which
will fail when there is no enough low memory.
2. If reserving crashkernel above 4G, in this case, crash dump
kernel will boot failure because there is no low memory available
for allocation.

To solve these issues, change the behavior of crashkernel=X.
crashkernel=X tries low allocation in DMA zone and fall back to high
allocation if it fails.

We can also use "crashkernel=X,high" to select a high region above
DMA zone, which also tries to allocate at least 256M low memory in
DMA zone automatically and "crashkernel=Y,low" can be used to allocate
specified size low memory.

When reserving crashkernel in high memory, some low memory is reserved
for crash dump kernel devices. So there may be two regions reserved for
crash dump kernel.
In order to distinct from the high region and make no effect to the use
of existing kexec-tools, rename the low region as "Crash kernel (low)",
and pass the low region by reusing DT property
"linux,usable-memory-range". We made the low memory region as the last
range of "linux,usable-memory-range" to keep compatibility with existing
user-space and older kdump kernels.

Besides, we need to modify kexec-tools:
arm64: support more than one crash kernel regions(see [1])

Another update is document about DT property 'linux,usable-memory-range':
schemas: update 'linux,usable-memory-range' node schema(see [2])


[1]: https://www.spinics.net/lists/kexec/msg28226.html
[2]: https://github.com/robherring/dt-schema/pull/19 
[v1]: https://lkml.org/lkml/2019/4/2/1174
[v2]: https://lkml.org/lkml/2019/4/9/86
[v3]: https://lkml.org/lkml/2019/4/9/306
[v4]: https://lkml.org/lkml/2019/4/15/273
[v5]: https://lkml.org/lkml/2019/5/6/1360
[v6]: https://lkml.org/lkml/2019/8/30/142
[v7]: https://lkml.org/lkml/2019/12/23/411
[v8]: https://lkml.org/lkml/2020/5/21/213
[v9]: https://lkml.org/lkml/2020/6/28/73
[v10]: https://lkml.org/lkml/2020/7/2/1443
[v11]: https://lkml.org/lkml/2020/8/1/150
[v12]: https://lkml.org/lkml/2020/9/7/1037
[v13]: https://lkml.org/lkml/2020/10/31/34
[v14]: https://lkml.org/lkml/2021/1/30/53
[v15]: https://lkml.org/lkml/2021/10/19/1405
[v16]: https://lkml.org/lkml/2021/11/23/435
[v17]: https://lkml.org/lkml/2021/12/10/38
[v18]: https://lkml.org/lkml/2021/12/22/424
[v19]: https://lkml.org/lkml/2021/12/28/203
[v20]: https://lkml.org/lkml/2022/1/24/167
[v21]: https://lkml.org/lkml/2022/2/26/350

Chen Zhou (2):
  arm64: kdump: Reimplement crashkernel=X
  of: fdt: Add memory for devices by DT property
    "linux,usable-memory-range"

Zhen Lei (7):
  kdump: return -ENOENT if required cmdline option does not exist
  arm64: Use insert_resource() to simplify code
  arm64: kdump: Remove some redundant checks in map_mem()
  arm64: kdump: Don't force page-level mappings for memory above 4G
  arm64: kdump: Use page-level mapping for the high memory of
    crashkernel
  arm64: kdump: Try not to use NO_BLOCK_MAPPINGS for memory under 4G
  docs: kdump: Update the crashkernel description for arm64

 .../admin-guide/kernel-parameters.txt         |   8 +-
 arch/arm64/include/asm/kexec.h                |   2 +
 arch/arm64/kernel/machine_kexec.c             |   9 +-
 arch/arm64/kernel/machine_kexec_file.c        |  12 +-
 arch/arm64/kernel/setup.c                     |  17 +-
 arch/arm64/mm/init.c                          | 204 +++++++++++++++++-
 arch/arm64/mm/mmu.c                           |  68 +++---
 drivers/of/fdt.c                              |  33 ++-
 kernel/crash_core.c                           |   3 +-
 9 files changed, 278 insertions(+), 78 deletions(-)

Comments

Dave Kleikamp April 19, 2022, 5:02 p.m. UTC | #1
For the series:

Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>

Thanks for keeping this patch set alive.

On 4/14/22 6:57AM, Zhen Lei wrote:
> Changes since [v21]:
> 1. Update the commit message of  patch 1 and 5.
> 2. Add some comments for reserve_crashkernel() in patch 5.
> 
> Thanks to Baoquan He and John Donnelly for their review comments.
> 
> Because v5.18-rc1 has added a new patch
> commit  031495635b46 ("arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones")
> There are many new scenarios:
> 1) The mappings may be block or page-level.
> 2) The call to reserve_crashkernel() may or may not be deferred.
> 3) The the upper limit of DMA address may be 4G, or less than 4G. Or the
>     upper limit of physical memory, because SMMU can do the mapping.
> 
> The code of patch 1-2, 8-9 keep no change, because the above-mentioned issues are not involved.
> The code of patch 5 only makes the following changes:
> -	if (crash_base >= SZ_4G)
> +	/*
> +	 * When both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32 are disabled, the
> +	 * CRASH_ADDR_LOW_MAX equals the upper limit of physical memory, so
> +	 * the 'crash_base' of high memory can not exceed it. To follow the
> +	 * description of "crashkernel=X,high" option, add below 'high'
> +	 * condition to make sure the crash low memory will be reserved.
> +	 */
> +	if ((crash_base >= CRASH_ADDR_LOW_MAX) || high) {
> Change SZ_4G to CRASH_ADDR_LOW_MAX, because arm64_dma_phys_limit may be less than
> 4G or greater than 4G. The check 'high' is used for "crashkernel=X,high" and
> "(crash_base >= CRASH_ADDR_LOW_MAX)" is used for "crashkernel=X[@offset]".
> 
> Patch 3-4 to allow block mappings for memory above 4G.
> Patch 6-7 to support only crash high memory or fixed memory range specified by
> crashkernel=X@offset use page-level mapping, to allow other areas use block mapping.
> These four patches are for performance optimization purposes. For details about the
> technical feasibility analysis, please see the commit messages.
> 
> Now the implementation of arm64 is very different from that of x86. It's no longer
> suitable for both of them to share code.
> 
> 
> 
> Changes since [v20]:
> 1. Check whether crashkernel=Y,low is incorrectly configured or not configured. Do different processing.
> 2. Share the existing description of x86. The configuration of arm64 is the same as that of x86.
> 3. Define the value of macro CRASH_ADDR_HIGH_MAX as memblock.current_limit, instead of MEMBLOCK_ALLOC_ACCESSIBLE.
> 4. To improve readability, some lightweight code adjustments have been made to reserve_craskernel(), including comments.
> 5. The defined value of DEFAULT_CRASH_KERNEL_LOW_SIZE reconsiders swiotlb, just like x86, to share documents.
> 
> Thanks to Baoquan He for his careful review.
> 
> The test cases are as follows: (Please update the kexec tool to the latest version)
> 1) crashkernel=4G						//high=4G, low=256M
> 2) crashkernel=4G crashkernel=512M,high crashkernel=512M,low	//high=4G, low=256M, high and low are ignored
> 3) crashkernel=4G crashkernel=512M,high				//high=4G, low=256M, high is ignored
> 4) crashkernel=4G crashkernel=512M,low				//high=4G, low=256M, low is ignored
> 5) crashkernel=4G@0xe0000000					//high=0G, low=0M, cannot allocate, failed
> 6) crashkernel=512M						//high=0G, low=512M
> 7) crashkernel=128M						//high=0G, low=128M
> 8) crashkernel=512M@0xde000000		//512M@3552M		//high=0G, low=512M
> 9) crashkernel=4G,high						//high=4G, low=256M
> a) crashkernel=4G,high crashkernel=512M,low			//high=4G, low=512M
> b) crashkernel=512M,high crashkernel=128M,low			//high=512M, low=128M
> c) crashkernel=128M,high					//high=128M, low=256M
> d) crashkernel=512M,low						//high=0G, low=0M, invalid
> e) crashkernel=512M,high crashkernel=0,low			//high=512M, low=0M
> f) crashkernel=4G,high crashkernel=ab,low			//high=0G, low=0M, invalid
> 
> 
> Changes since [v19]:
> 1. Temporarily stop making reserve_crashkernel[_low]() generic. There are a
>     lot of details need to be considered, which can take a long time. Because
>     "make generic" does not add new functions and does not improve performance,
>     maybe I should say it's just a cleanup. So by stripping it out and leaving
>     it for other patches later, we can aggregate the changes to the main functions.
> 2. Use insert_resource() to replace request_resource(), this not only simplifies
>     the code, but also reduces the differences between arm64 and x86 implementations.
> 3. As commit 157752d84f5d ("kexec: use Crash kernel for Crash kernel low") do for
>     x86, we can also extend kexec-tools for arm64, and it's currently applied. See:
>     https://www.spinics.net/lists/kexec/msg28284.html
> 
> Thank you very much, Borislav Petkov, for so many valuable comments.
> 
> Changes since [v17]: v17 --> v19
> 1. Patch 0001-0004
>     Introduce generic parse_crashkernel_high_low() to bring the parsing of
>     "crashkernel=X,high" and the parsing of "crashkernel=X,low" together,
>     then use it instead of the call to parse_crashkernel_{high|low}(). Two
>     confusing parameters of parse_crashkernel_{high|low}() are deleted.
> 
>     I previously sent these four patches separately:
>     [1] https://lkml.org/lkml/2021/12/25/40
> 2. Patch 0005-0009
>     Introduce generic reserve_crashkernel_mem[_low](), the implementation of
>     these two functions is based on function reserve_crashkernel[_low]() in
>     arch/x86/kernel/setup.c. There is no functional change for x86.
>     1) The check position of xen_pv_domain() does not change.
>     2) Still 1M alignment for crash kernel fixed region, when 'base' is specified.
> 
>     To avoid compilation problems on other architectures: patch 0004 moves
>     the definition of global variable crashk[_low]_res from kexec_core.c to
>     crash_core.c, and provide default definitions for all macros involved, a
>     particular platform can redefine these macros to override the default
>     values.
> 3. 0010, only one line of comment was changed.
> 4. 0011
>     1) crashk_low_res may also a valid reserved memory, should be checked
>        in crash_is_nosave(), see arch/arm64/kernel/machine_kexec.
>     2) Drop memblock_mark_nomap() for crashk_low_res, because of:
>        2687275a5843 arm64: Force NO_BLOCK_MAPPINGS if crashkernel reservation is required
>     3) Also call kmemleak_ignore_phys() for crashk_low_res, because of:
>        85f58eb18898 arm64: kdump: Skip kmemleak scan reserved memory for kdump
> 5. 0012, slightly rebased, because the following patch is applied in advance.
>     https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git/commit/?h=dt/linus&id=8347b41748c3019157312fbe7f8a6792ae396eb7
> 6. 0013, no change.
> 
> Others:
> 1. Discard add ARCH_WANT_RESERVE_CRASH_KERNEL
> 2. When allocating crash low memory, the start address still starts from 0.
>     low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX);
> 3. Discard change (1ULL << 32) to CRASH_ADDR_LOW_MAX.
> 4. Ensure the check position of xen_pv_domain() have no change.
> 5. Except patch 0010 and 0012, all "Tested-by", "Reviewed-by", "Acked-by" are removed.
> 6. Update description.
> 
> 
> 
> Changes since [v16]
> - Because no functional changes in this version, so add
>    "Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>" for patch 1-9
> - Add "Reviewed-by: Rob Herring <robh@kernel.org>" for patch 8
> - Update patch 9 based on the review comments of Rob Herring
> - As Catalin Marinas's suggestion, merge the implementation of
>    ARCH_WANT_RESERVE_CRASH_KERNEL into patch 5. Ensure that the
>    contents of X86 and ARM64 do not overlap, and reduce unnecessary
>    temporary differences.
> 
> Changes since [v15]
> -  Aggregate the processing of "linux,usable-memory-range" into one function.
>     Only patch 9-10 have been updated.
> 
> Changes since [v14]
> - Recovering the requirement that the CrashKernel memory regions on X86
>    only requires 1 MiB alignment.
> - Combine patches 5 and 6 in v14 into one. The compilation warning fixed
>    by patch 6 was introduced by patch 5 in v14.
> - As with crashk_res, crashk_low_res is also processed by
>    crash_exclude_mem_range() in patch 7.
> - Due to commit b261dba2fdb2 ("arm64: kdump: Remove custom linux,usable-memory-range handling")
>    has removed the architecture-specific code, extend the property "linux,usable-memory-range"
>    in the platform-agnostic FDT core code. See patch 9.
> - Discard the x86 description update in the document, because the description
>    has been updated by commit b1f4c363666c ("Documentation: kdump: update kdump guide").
> - Change "arm64" to "ARM64" in Doc.
> 
> 
> Changes since [v13]
> - Rebased on top of 5.11-rc5.
> - Introduce config CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL.
> Since reserve_crashkernel[_low]() implementations are quite similar on
> other architectures, so have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in
> arch/Kconfig and select this by X86 and ARM64.
> - Some minor cleanup.
> 
> Changes since [v12]
> - Rebased on top of 5.10-rc1.
> - Keep CRASH_ALIGN as 16M suggested by Dave.
> - Drop patch "kdump: add threshold for the required memory".
> - Add Tested-by from John.
> 
> Changes since [v11]
> - Rebased on top of 5.9-rc4.
> - Make the function reserve_crashkernel() of x86 generic.
> Suggested by Catalin, make the function reserve_crashkernel() of x86 generic
> and arm64 use the generic version to reimplement crashkernel=X.
> 
> Changes since [v10]
> - Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.
> 
> Changes since [v9]
> - Patch 1 add Acked-by from Dave.
> - Update patch 5 according to Dave's comments.
> - Update chosen schema.
> 
> Changes since [v8]
> - Reuse DT property "linux,usable-memory-range".
> Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low
> memory region.
> - Fix kdump broken with ZONE_DMA reintroduced.
> - Update chosen schema.
> 
> Changes since [v7]
> - Move x86 CRASH_ALIGN to 2M
> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
> - Update Documentation/devicetree/bindings/chosen.txt.
> Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt
> suggested by Arnd.
> - Add Tested-by from Jhon and pk.
> 
> Changes since [v6]
> - Fix build errors reported by kbuild test robot.
> 
> Changes since [v5]
> - Move reserve_crashkernel_low() into kernel/crash_core.c.
> - Delete crashkernel=X,high.
> - Modify crashkernel=X,low.
> If crashkernel=X,low is specified simultaneously, reserve spcified size low
> memory for crash kdump kernel devices firstly and then reserve memory above 4G.
> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
> pass to crash dump kernel by DT property "linux,low-memory-range".
> - Update Documentation/admin-guide/kdump/kdump.rst.
> 
> Changes since [v4]
> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
> 
> Changes since [v3]
> - Add memblock_cap_memory_ranges back for multiple ranges.
> - Fix some compiling warnings.
> 
> Changes since [v2]
> - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
> two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
> patch.
> 
> Changes since [v1]:
> - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
> - Remove memblock_cap_memory_ranges() i added in v1 and implement that
> in fdt_enforce_memory_region().
> There are at most two crash kernel regions, for two crash kernel regions
> case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
> and then remove the memory range in the middle.
> 
> v1:
> There are following issues in arm64 kdump:
> 1. We use crashkernel=X to reserve crashkernel below 4G, which
> will fail when there is no enough low memory.
> 2. If reserving crashkernel above 4G, in this case, crash dump
> kernel will boot failure because there is no low memory available
> for allocation.
> 
> To solve these issues, change the behavior of crashkernel=X.
> crashkernel=X tries low allocation in DMA zone and fall back to high
> allocation if it fails.
> 
> We can also use "crashkernel=X,high" to select a high region above
> DMA zone, which also tries to allocate at least 256M low memory in
> DMA zone automatically and "crashkernel=Y,low" can be used to allocate
> specified size low memory.
> 
> When reserving crashkernel in high memory, some low memory is reserved
> for crash dump kernel devices. So there may be two regions reserved for
> crash dump kernel.
> In order to distinct from the high region and make no effect to the use
> of existing kexec-tools, rename the low region as "Crash kernel (low)",
> and pass the low region by reusing DT property
> "linux,usable-memory-range". We made the low memory region as the last
> range of "linux,usable-memory-range" to keep compatibility with existing
> user-space and older kdump kernels.
> 
> Besides, we need to modify kexec-tools:
> arm64: support more than one crash kernel regions(see [1])
> 
> Another update is document about DT property 'linux,usable-memory-range':
> schemas: update 'linux,usable-memory-range' node schema(see [2])
> 
> 
> [1]: https://www.spinics.net/lists/kexec/msg28226.html
> [2]: https://github.com/robherring/dt-schema/pull/19
> [v1]: https://lkml.org/lkml/2019/4/2/1174
> [v2]: https://lkml.org/lkml/2019/4/9/86
> [v3]: https://lkml.org/lkml/2019/4/9/306
> [v4]: https://lkml.org/lkml/2019/4/15/273
> [v5]: https://lkml.org/lkml/2019/5/6/1360
> [v6]: https://lkml.org/lkml/2019/8/30/142
> [v7]: https://lkml.org/lkml/2019/12/23/411
> [v8]: https://lkml.org/lkml/2020/5/21/213
> [v9]: https://lkml.org/lkml/2020/6/28/73
> [v10]: https://lkml.org/lkml/2020/7/2/1443
> [v11]: https://lkml.org/lkml/2020/8/1/150
> [v12]: https://lkml.org/lkml/2020/9/7/1037
> [v13]: https://lkml.org/lkml/2020/10/31/34
> [v14]: https://lkml.org/lkml/2021/1/30/53
> [v15]: https://lkml.org/lkml/2021/10/19/1405
> [v16]: https://lkml.org/lkml/2021/11/23/435
> [v17]: https://lkml.org/lkml/2021/12/10/38
> [v18]: https://lkml.org/lkml/2021/12/22/424
> [v19]: https://lkml.org/lkml/2021/12/28/203
> [v20]: https://lkml.org/lkml/2022/1/24/167
> [v21]: https://lkml.org/lkml/2022/2/26/350
> 
> Chen Zhou (2):
>    arm64: kdump: Reimplement crashkernel=X
>    of: fdt: Add memory for devices by DT property
>      "linux,usable-memory-range"
> 
> Zhen Lei (7):
>    kdump: return -ENOENT if required cmdline option does not exist
>    arm64: Use insert_resource() to simplify code
>    arm64: kdump: Remove some redundant checks in map_mem()
>    arm64: kdump: Don't force page-level mappings for memory above 4G
>    arm64: kdump: Use page-level mapping for the high memory of
>      crashkernel
>    arm64: kdump: Try not to use NO_BLOCK_MAPPINGS for memory under 4G
>    docs: kdump: Update the crashkernel description for arm64
> 
>   .../admin-guide/kernel-parameters.txt         |   8 +-
>   arch/arm64/include/asm/kexec.h                |   2 +
>   arch/arm64/kernel/machine_kexec.c             |   9 +-
>   arch/arm64/kernel/machine_kexec_file.c        |  12 +-
>   arch/arm64/kernel/setup.c                     |  17 +-
>   arch/arm64/mm/init.c                          | 204 +++++++++++++++++-
>   arch/arm64/mm/mmu.c                           |  68 +++---
>   drivers/of/fdt.c                              |  33 ++-
>   kernel/crash_core.c                           |   3 +-
>   9 files changed, 278 insertions(+), 78 deletions(-)
>
Leizhen (ThunderTown) April 25, 2022, 2:19 a.m. UTC | #2
Hi, Catalin Marinas and Will Deacon:
  Do you have time to review patches 3-4 and 6-7? These patches are a little more
relevant to the arm64 architecture.

Hi, Baoquan He:
  Can you review patches 1 and 5 again?


On 2022/4/14 19:57, Zhen Lei wrote:
> Changes since [v21]:
> 1. Update the commit message of  patch 1 and 5.
> 2. Add some comments for reserve_crashkernel() in patch 5.
> 
> Thanks to Baoquan He and John Donnelly for their review comments.
> 
> Because v5.18-rc1 has added a new patch
> commit  031495635b46 ("arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones")
> There are many new scenarios:
> 1) The mappings may be block or page-level. 
> 2) The call to reserve_crashkernel() may or may not be deferred.
> 3) The the upper limit of DMA address may be 4G, or less than 4G. Or the
>    upper limit of physical memory, because SMMU can do the mapping.
> 
> The code of patch 1-2, 8-9 keep no change, because the above-mentioned issues are not involved.
> The code of patch 5 only makes the following changes:
> -	if (crash_base >= SZ_4G)
> +	/*
> +	 * When both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32 are disabled, the
> +	 * CRASH_ADDR_LOW_MAX equals the upper limit of physical memory, so
> +	 * the 'crash_base' of high memory can not exceed it. To follow the
> +	 * description of "crashkernel=X,high" option, add below 'high'
> +	 * condition to make sure the crash low memory will be reserved.
> +	 */
> +	if ((crash_base >= CRASH_ADDR_LOW_MAX) || high) {
> Change SZ_4G to CRASH_ADDR_LOW_MAX, because arm64_dma_phys_limit may be less than
> 4G or greater than 4G. The check 'high' is used for "crashkernel=X,high" and
> "(crash_base >= CRASH_ADDR_LOW_MAX)" is used for "crashkernel=X[@offset]".
> 
> Patch 3-4 to allow block mappings for memory above 4G.
> Patch 6-7 to support only crash high memory or fixed memory range specified by
> crashkernel=X@offset use page-level mapping, to allow other areas use block mapping.
> These four patches are for performance optimization purposes. For details about the
> technical feasibility analysis, please see the commit messages.
> 
> Now the implementation of arm64 is very different from that of x86. It's no longer
> suitable for both of them to share code.
> 
> 
> 
> Changes since [v20]:
> 1. Check whether crashkernel=Y,low is incorrectly configured or not configured. Do different processing.
> 2. Share the existing description of x86. The configuration of arm64 is the same as that of x86.
> 3. Define the value of macro CRASH_ADDR_HIGH_MAX as memblock.current_limit, instead of MEMBLOCK_ALLOC_ACCESSIBLE.
> 4. To improve readability, some lightweight code adjustments have been made to reserve_craskernel(), including comments.
> 5. The defined value of DEFAULT_CRASH_KERNEL_LOW_SIZE reconsiders swiotlb, just like x86, to share documents.
> 
> Thanks to Baoquan He for his careful review.
> 
> The test cases are as follows: (Please update the kexec tool to the latest version)
> 1) crashkernel=4G						//high=4G, low=256M
> 2) crashkernel=4G crashkernel=512M,high crashkernel=512M,low	//high=4G, low=256M, high and low are ignored
> 3) crashkernel=4G crashkernel=512M,high				//high=4G, low=256M, high is ignored
> 4) crashkernel=4G crashkernel=512M,low				//high=4G, low=256M, low is ignored
> 5) crashkernel=4G@0xe0000000					//high=0G, low=0M, cannot allocate, failed
> 6) crashkernel=512M						//high=0G, low=512M
> 7) crashkernel=128M						//high=0G, low=128M
> 8) crashkernel=512M@0xde000000		//512M@3552M		//high=0G, low=512M
> 9) crashkernel=4G,high						//high=4G, low=256M
> a) crashkernel=4G,high crashkernel=512M,low			//high=4G, low=512M
> b) crashkernel=512M,high crashkernel=128M,low			//high=512M, low=128M
> c) crashkernel=128M,high					//high=128M, low=256M
> d) crashkernel=512M,low						//high=0G, low=0M, invalid
> e) crashkernel=512M,high crashkernel=0,low			//high=512M, low=0M
> f) crashkernel=4G,high crashkernel=ab,low			//high=0G, low=0M, invalid
> 
> 
> Changes since [v19]:
> 1. Temporarily stop making reserve_crashkernel[_low]() generic. There are a
>    lot of details need to be considered, which can take a long time. Because
>    "make generic" does not add new functions and does not improve performance,
>    maybe I should say it's just a cleanup. So by stripping it out and leaving
>    it for other patches later, we can aggregate the changes to the main functions.
> 2. Use insert_resource() to replace request_resource(), this not only simplifies
>    the code, but also reduces the differences between arm64 and x86 implementations.
> 3. As commit 157752d84f5d ("kexec: use Crash kernel for Crash kernel low") do for
>    x86, we can also extend kexec-tools for arm64, and it's currently applied. See:
>    https://www.spinics.net/lists/kexec/msg28284.html
> 
> Thank you very much, Borislav Petkov, for so many valuable comments.
> 
> Changes since [v17]: v17 --> v19
> 1. Patch 0001-0004
>    Introduce generic parse_crashkernel_high_low() to bring the parsing of
>    "crashkernel=X,high" and the parsing of "crashkernel=X,low" together,
>    then use it instead of the call to parse_crashkernel_{high|low}(). Two
>    confusing parameters of parse_crashkernel_{high|low}() are deleted.
> 
>    I previously sent these four patches separately:
>    [1] https://lkml.org/lkml/2021/12/25/40
> 2. Patch 0005-0009
>    Introduce generic reserve_crashkernel_mem[_low](), the implementation of
>    these two functions is based on function reserve_crashkernel[_low]() in
>    arch/x86/kernel/setup.c. There is no functional change for x86.
>    1) The check position of xen_pv_domain() does not change.
>    2) Still 1M alignment for crash kernel fixed region, when 'base' is specified.
> 
>    To avoid compilation problems on other architectures: patch 0004 moves
>    the definition of global variable crashk[_low]_res from kexec_core.c to
>    crash_core.c, and provide default definitions for all macros involved, a
>    particular platform can redefine these macros to override the default
>    values.
> 3. 0010, only one line of comment was changed.
> 4. 0011
>    1) crashk_low_res may also a valid reserved memory, should be checked
>       in crash_is_nosave(), see arch/arm64/kernel/machine_kexec.
>    2) Drop memblock_mark_nomap() for crashk_low_res, because of:
>       2687275a5843 arm64: Force NO_BLOCK_MAPPINGS if crashkernel reservation is required
>    3) Also call kmemleak_ignore_phys() for crashk_low_res, because of:
>       85f58eb18898 arm64: kdump: Skip kmemleak scan reserved memory for kdump
> 5. 0012, slightly rebased, because the following patch is applied in advance. 
>    https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git/commit/?h=dt/linus&id=8347b41748c3019157312fbe7f8a6792ae396eb7
> 6. 0013, no change.
> 
> Others:
> 1. Discard add ARCH_WANT_RESERVE_CRASH_KERNEL
> 2. When allocating crash low memory, the start address still starts from 0.
>    low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX);
> 3. Discard change (1ULL << 32) to CRASH_ADDR_LOW_MAX.
> 4. Ensure the check position of xen_pv_domain() have no change.
> 5. Except patch 0010 and 0012, all "Tested-by", "Reviewed-by", "Acked-by" are removed.
> 6. Update description.
> 
> 
> 
> Changes since [v16]
> - Because no functional changes in this version, so add
>   "Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>" for patch 1-9
> - Add "Reviewed-by: Rob Herring <robh@kernel.org>" for patch 8
> - Update patch 9 based on the review comments of Rob Herring
> - As Catalin Marinas's suggestion, merge the implementation of
>   ARCH_WANT_RESERVE_CRASH_KERNEL into patch 5. Ensure that the
>   contents of X86 and ARM64 do not overlap, and reduce unnecessary
>   temporary differences.
> 
> Changes since [v15]
> -  Aggregate the processing of "linux,usable-memory-range" into one function.
>    Only patch 9-10 have been updated.
> 
> Changes since [v14]
> - Recovering the requirement that the CrashKernel memory regions on X86
>   only requires 1 MiB alignment.
> - Combine patches 5 and 6 in v14 into one. The compilation warning fixed
>   by patch 6 was introduced by patch 5 in v14.
> - As with crashk_res, crashk_low_res is also processed by
>   crash_exclude_mem_range() in patch 7.
> - Due to commit b261dba2fdb2 ("arm64: kdump: Remove custom linux,usable-memory-range handling")
>   has removed the architecture-specific code, extend the property "linux,usable-memory-range"
>   in the platform-agnostic FDT core code. See patch 9.
> - Discard the x86 description update in the document, because the description
>   has been updated by commit b1f4c363666c ("Documentation: kdump: update kdump guide").
> - Change "arm64" to "ARM64" in Doc.
> 
> 
> Changes since [v13]
> - Rebased on top of 5.11-rc5.
> - Introduce config CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL.
> Since reserve_crashkernel[_low]() implementations are quite similar on
> other architectures, so have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in
> arch/Kconfig and select this by X86 and ARM64.
> - Some minor cleanup.
> 
> Changes since [v12]
> - Rebased on top of 5.10-rc1.
> - Keep CRASH_ALIGN as 16M suggested by Dave.
> - Drop patch "kdump: add threshold for the required memory".
> - Add Tested-by from John.
> 
> Changes since [v11]
> - Rebased on top of 5.9-rc4.
> - Make the function reserve_crashkernel() of x86 generic.
> Suggested by Catalin, make the function reserve_crashkernel() of x86 generic
> and arm64 use the generic version to reimplement crashkernel=X.
> 
> Changes since [v10]
> - Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.
> 
> Changes since [v9]
> - Patch 1 add Acked-by from Dave.
> - Update patch 5 according to Dave's comments.
> - Update chosen schema.
> 
> Changes since [v8]
> - Reuse DT property "linux,usable-memory-range".
> Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low
> memory region.
> - Fix kdump broken with ZONE_DMA reintroduced.
> - Update chosen schema.
> 
> Changes since [v7]
> - Move x86 CRASH_ALIGN to 2M
> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
> - Update Documentation/devicetree/bindings/chosen.txt.
> Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt
> suggested by Arnd.
> - Add Tested-by from Jhon and pk.
> 
> Changes since [v6]
> - Fix build errors reported by kbuild test robot.
> 
> Changes since [v5]
> - Move reserve_crashkernel_low() into kernel/crash_core.c.
> - Delete crashkernel=X,high.
> - Modify crashkernel=X,low.
> If crashkernel=X,low is specified simultaneously, reserve spcified size low
> memory for crash kdump kernel devices firstly and then reserve memory above 4G.
> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
> pass to crash dump kernel by DT property "linux,low-memory-range".
> - Update Documentation/admin-guide/kdump/kdump.rst.
> 
> Changes since [v4]
> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
> 
> Changes since [v3]
> - Add memblock_cap_memory_ranges back for multiple ranges.
> - Fix some compiling warnings.
> 
> Changes since [v2]
> - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
> two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
> patch.
> 
> Changes since [v1]:
> - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
> - Remove memblock_cap_memory_ranges() i added in v1 and implement that
> in fdt_enforce_memory_region().
> There are at most two crash kernel regions, for two crash kernel regions
> case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
> and then remove the memory range in the middle.
> 
> v1:
> There are following issues in arm64 kdump:
> 1. We use crashkernel=X to reserve crashkernel below 4G, which
> will fail when there is no enough low memory.
> 2. If reserving crashkernel above 4G, in this case, crash dump
> kernel will boot failure because there is no low memory available
> for allocation.
> 
> To solve these issues, change the behavior of crashkernel=X.
> crashkernel=X tries low allocation in DMA zone and fall back to high
> allocation if it fails.
> 
> We can also use "crashkernel=X,high" to select a high region above
> DMA zone, which also tries to allocate at least 256M low memory in
> DMA zone automatically and "crashkernel=Y,low" can be used to allocate
> specified size low memory.
> 
> When reserving crashkernel in high memory, some low memory is reserved
> for crash dump kernel devices. So there may be two regions reserved for
> crash dump kernel.
> In order to distinct from the high region and make no effect to the use
> of existing kexec-tools, rename the low region as "Crash kernel (low)",
> and pass the low region by reusing DT property
> "linux,usable-memory-range". We made the low memory region as the last
> range of "linux,usable-memory-range" to keep compatibility with existing
> user-space and older kdump kernels.
> 
> Besides, we need to modify kexec-tools:
> arm64: support more than one crash kernel regions(see [1])
> 
> Another update is document about DT property 'linux,usable-memory-range':
> schemas: update 'linux,usable-memory-range' node schema(see [2])
> 
> 
> [1]: https://www.spinics.net/lists/kexec/msg28226.html
> [2]: https://github.com/robherring/dt-schema/pull/19 
> [v1]: https://lkml.org/lkml/2019/4/2/1174
> [v2]: https://lkml.org/lkml/2019/4/9/86
> [v3]: https://lkml.org/lkml/2019/4/9/306
> [v4]: https://lkml.org/lkml/2019/4/15/273
> [v5]: https://lkml.org/lkml/2019/5/6/1360
> [v6]: https://lkml.org/lkml/2019/8/30/142
> [v7]: https://lkml.org/lkml/2019/12/23/411
> [v8]: https://lkml.org/lkml/2020/5/21/213
> [v9]: https://lkml.org/lkml/2020/6/28/73
> [v10]: https://lkml.org/lkml/2020/7/2/1443
> [v11]: https://lkml.org/lkml/2020/8/1/150
> [v12]: https://lkml.org/lkml/2020/9/7/1037
> [v13]: https://lkml.org/lkml/2020/10/31/34
> [v14]: https://lkml.org/lkml/2021/1/30/53
> [v15]: https://lkml.org/lkml/2021/10/19/1405
> [v16]: https://lkml.org/lkml/2021/11/23/435
> [v17]: https://lkml.org/lkml/2021/12/10/38
> [v18]: https://lkml.org/lkml/2021/12/22/424
> [v19]: https://lkml.org/lkml/2021/12/28/203
> [v20]: https://lkml.org/lkml/2022/1/24/167
> [v21]: https://lkml.org/lkml/2022/2/26/350
> 
> Chen Zhou (2):
>   arm64: kdump: Reimplement crashkernel=X
>   of: fdt: Add memory for devices by DT property
>     "linux,usable-memory-range"
> 
> Zhen Lei (7):
>   kdump: return -ENOENT if required cmdline option does not exist
>   arm64: Use insert_resource() to simplify code
>   arm64: kdump: Remove some redundant checks in map_mem()
>   arm64: kdump: Don't force page-level mappings for memory above 4G
>   arm64: kdump: Use page-level mapping for the high memory of
>     crashkernel
>   arm64: kdump: Try not to use NO_BLOCK_MAPPINGS for memory under 4G
>   docs: kdump: Update the crashkernel description for arm64
> 
>  .../admin-guide/kernel-parameters.txt         |   8 +-
>  arch/arm64/include/asm/kexec.h                |   2 +
>  arch/arm64/kernel/machine_kexec.c             |   9 +-
>  arch/arm64/kernel/machine_kexec_file.c        |  12 +-
>  arch/arm64/kernel/setup.c                     |  17 +-
>  arch/arm64/mm/init.c                          | 204 +++++++++++++++++-
>  arch/arm64/mm/mmu.c                           |  68 +++---
>  drivers/of/fdt.c                              |  33 ++-
>  kernel/crash_core.c                           |   3 +-
>  9 files changed, 278 insertions(+), 78 deletions(-)
>
Baoquan He April 25, 2022, 2:45 a.m. UTC | #3
On 04/25/22 at 10:19am, Leizhen (ThunderTown) wrote:
> 
> Hi, Catalin Marinas and Will Deacon:
>   Do you have time to review patches 3-4 and 6-7? These patches are a little more
> relevant to the arm64 architecture.
> 
> Hi, Baoquan He:
>   Can you review patches 1 and 5 again?

Yes, and I will test and review the whole series if no one check the
arm64 related changes.

> 
> 
> On 2022/4/14 19:57, Zhen Lei wrote:
> > Changes since [v21]:
> > 1. Update the commit message of  patch 1 and 5.
> > 2. Add some comments for reserve_crashkernel() in patch 5.
> > 
> > Thanks to Baoquan He and John Donnelly for their review comments.
> > 
> > Because v5.18-rc1 has added a new patch
> > commit  031495635b46 ("arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones")
> > There are many new scenarios:
> > 1) The mappings may be block or page-level. 
> > 2) The call to reserve_crashkernel() may or may not be deferred.
> > 3) The the upper limit of DMA address may be 4G, or less than 4G. Or the
> >    upper limit of physical memory, because SMMU can do the mapping.
> > 
> > The code of patch 1-2, 8-9 keep no change, because the above-mentioned issues are not involved.
> > The code of patch 5 only makes the following changes:
> > -	if (crash_base >= SZ_4G)
> > +	/*
> > +	 * When both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32 are disabled, the
> > +	 * CRASH_ADDR_LOW_MAX equals the upper limit of physical memory, so
> > +	 * the 'crash_base' of high memory can not exceed it. To follow the
> > +	 * description of "crashkernel=X,high" option, add below 'high'
> > +	 * condition to make sure the crash low memory will be reserved.
> > +	 */
> > +	if ((crash_base >= CRASH_ADDR_LOW_MAX) || high) {
> > Change SZ_4G to CRASH_ADDR_LOW_MAX, because arm64_dma_phys_limit may be less than
> > 4G or greater than 4G. The check 'high' is used for "crashkernel=X,high" and
> > "(crash_base >= CRASH_ADDR_LOW_MAX)" is used for "crashkernel=X[@offset]".
> > 
> > Patch 3-4 to allow block mappings for memory above 4G.
> > Patch 6-7 to support only crash high memory or fixed memory range specified by
> > crashkernel=X@offset use page-level mapping, to allow other areas use block mapping.
> > These four patches are for performance optimization purposes. For details about the
> > technical feasibility analysis, please see the commit messages.
> > 
> > Now the implementation of arm64 is very different from that of x86. It's no longer
> > suitable for both of them to share code.
> > 
> > 
> > 
> > Changes since [v20]:
> > 1. Check whether crashkernel=Y,low is incorrectly configured or not configured. Do different processing.
> > 2. Share the existing description of x86. The configuration of arm64 is the same as that of x86.
> > 3. Define the value of macro CRASH_ADDR_HIGH_MAX as memblock.current_limit, instead of MEMBLOCK_ALLOC_ACCESSIBLE.
> > 4. To improve readability, some lightweight code adjustments have been made to reserve_craskernel(), including comments.
> > 5. The defined value of DEFAULT_CRASH_KERNEL_LOW_SIZE reconsiders swiotlb, just like x86, to share documents.
> > 
> > Thanks to Baoquan He for his careful review.
> > 
> > The test cases are as follows: (Please update the kexec tool to the latest version)
> > 1) crashkernel=4G						//high=4G, low=256M
> > 2) crashkernel=4G crashkernel=512M,high crashkernel=512M,low	//high=4G, low=256M, high and low are ignored
> > 3) crashkernel=4G crashkernel=512M,high				//high=4G, low=256M, high is ignored
> > 4) crashkernel=4G crashkernel=512M,low				//high=4G, low=256M, low is ignored
> > 5) crashkernel=4G@0xe0000000					//high=0G, low=0M, cannot allocate, failed
> > 6) crashkernel=512M						//high=0G, low=512M
> > 7) crashkernel=128M						//high=0G, low=128M
> > 8) crashkernel=512M@0xde000000		//512M@3552M		//high=0G, low=512M
> > 9) crashkernel=4G,high						//high=4G, low=256M
> > a) crashkernel=4G,high crashkernel=512M,low			//high=4G, low=512M
> > b) crashkernel=512M,high crashkernel=128M,low			//high=512M, low=128M
> > c) crashkernel=128M,high					//high=128M, low=256M
> > d) crashkernel=512M,low						//high=0G, low=0M, invalid
> > e) crashkernel=512M,high crashkernel=0,low			//high=512M, low=0M
> > f) crashkernel=4G,high crashkernel=ab,low			//high=0G, low=0M, invalid
> > 
> > 
> > Changes since [v19]:
> > 1. Temporarily stop making reserve_crashkernel[_low]() generic. There are a
> >    lot of details need to be considered, which can take a long time. Because
> >    "make generic" does not add new functions and does not improve performance,
> >    maybe I should say it's just a cleanup. So by stripping it out and leaving
> >    it for other patches later, we can aggregate the changes to the main functions.
> > 2. Use insert_resource() to replace request_resource(), this not only simplifies
> >    the code, but also reduces the differences between arm64 and x86 implementations.
> > 3. As commit 157752d84f5d ("kexec: use Crash kernel for Crash kernel low") do for
> >    x86, we can also extend kexec-tools for arm64, and it's currently applied. See:
> >    https://www.spinics.net/lists/kexec/msg28284.html
> > 
> > Thank you very much, Borislav Petkov, for so many valuable comments.
> > 
> > Changes since [v17]: v17 --> v19
> > 1. Patch 0001-0004
> >    Introduce generic parse_crashkernel_high_low() to bring the parsing of
> >    "crashkernel=X,high" and the parsing of "crashkernel=X,low" together,
> >    then use it instead of the call to parse_crashkernel_{high|low}(). Two
> >    confusing parameters of parse_crashkernel_{high|low}() are deleted.
> > 
> >    I previously sent these four patches separately:
> >    [1] https://lkml.org/lkml/2021/12/25/40
> > 2. Patch 0005-0009
> >    Introduce generic reserve_crashkernel_mem[_low](), the implementation of
> >    these two functions is based on function reserve_crashkernel[_low]() in
> >    arch/x86/kernel/setup.c. There is no functional change for x86.
> >    1) The check position of xen_pv_domain() does not change.
> >    2) Still 1M alignment for crash kernel fixed region, when 'base' is specified.
> > 
> >    To avoid compilation problems on other architectures: patch 0004 moves
> >    the definition of global variable crashk[_low]_res from kexec_core.c to
> >    crash_core.c, and provide default definitions for all macros involved, a
> >    particular platform can redefine these macros to override the default
> >    values.
> > 3. 0010, only one line of comment was changed.
> > 4. 0011
> >    1) crashk_low_res may also a valid reserved memory, should be checked
> >       in crash_is_nosave(), see arch/arm64/kernel/machine_kexec.
> >    2) Drop memblock_mark_nomap() for crashk_low_res, because of:
> >       2687275a5843 arm64: Force NO_BLOCK_MAPPINGS if crashkernel reservation is required
> >    3) Also call kmemleak_ignore_phys() for crashk_low_res, because of:
> >       85f58eb18898 arm64: kdump: Skip kmemleak scan reserved memory for kdump
> > 5. 0012, slightly rebased, because the following patch is applied in advance. 
> >    https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git/commit/?h=dt/linus&id=8347b41748c3019157312fbe7f8a6792ae396eb7
> > 6. 0013, no change.
> > 
> > Others:
> > 1. Discard add ARCH_WANT_RESERVE_CRASH_KERNEL
> > 2. When allocating crash low memory, the start address still starts from 0.
> >    low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX);
> > 3. Discard change (1ULL << 32) to CRASH_ADDR_LOW_MAX.
> > 4. Ensure the check position of xen_pv_domain() have no change.
> > 5. Except patch 0010 and 0012, all "Tested-by", "Reviewed-by", "Acked-by" are removed.
> > 6. Update description.
> > 
> > 
> > 
> > Changes since [v16]
> > - Because no functional changes in this version, so add
> >   "Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>" for patch 1-9
> > - Add "Reviewed-by: Rob Herring <robh@kernel.org>" for patch 8
> > - Update patch 9 based on the review comments of Rob Herring
> > - As Catalin Marinas's suggestion, merge the implementation of
> >   ARCH_WANT_RESERVE_CRASH_KERNEL into patch 5. Ensure that the
> >   contents of X86 and ARM64 do not overlap, and reduce unnecessary
> >   temporary differences.
> > 
> > Changes since [v15]
> > -  Aggregate the processing of "linux,usable-memory-range" into one function.
> >    Only patch 9-10 have been updated.
> > 
> > Changes since [v14]
> > - Recovering the requirement that the CrashKernel memory regions on X86
> >   only requires 1 MiB alignment.
> > - Combine patches 5 and 6 in v14 into one. The compilation warning fixed
> >   by patch 6 was introduced by patch 5 in v14.
> > - As with crashk_res, crashk_low_res is also processed by
> >   crash_exclude_mem_range() in patch 7.
> > - Due to commit b261dba2fdb2 ("arm64: kdump: Remove custom linux,usable-memory-range handling")
> >   has removed the architecture-specific code, extend the property "linux,usable-memory-range"
> >   in the platform-agnostic FDT core code. See patch 9.
> > - Discard the x86 description update in the document, because the description
> >   has been updated by commit b1f4c363666c ("Documentation: kdump: update kdump guide").
> > - Change "arm64" to "ARM64" in Doc.
> > 
> > 
> > Changes since [v13]
> > - Rebased on top of 5.11-rc5.
> > - Introduce config CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL.
> > Since reserve_crashkernel[_low]() implementations are quite similar on
> > other architectures, so have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in
> > arch/Kconfig and select this by X86 and ARM64.
> > - Some minor cleanup.
> > 
> > Changes since [v12]
> > - Rebased on top of 5.10-rc1.
> > - Keep CRASH_ALIGN as 16M suggested by Dave.
> > - Drop patch "kdump: add threshold for the required memory".
> > - Add Tested-by from John.
> > 
> > Changes since [v11]
> > - Rebased on top of 5.9-rc4.
> > - Make the function reserve_crashkernel() of x86 generic.
> > Suggested by Catalin, make the function reserve_crashkernel() of x86 generic
> > and arm64 use the generic version to reimplement crashkernel=X.
> > 
> > Changes since [v10]
> > - Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.
> > 
> > Changes since [v9]
> > - Patch 1 add Acked-by from Dave.
> > - Update patch 5 according to Dave's comments.
> > - Update chosen schema.
> > 
> > Changes since [v8]
> > - Reuse DT property "linux,usable-memory-range".
> > Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low
> > memory region.
> > - Fix kdump broken with ZONE_DMA reintroduced.
> > - Update chosen schema.
> > 
> > Changes since [v7]
> > - Move x86 CRASH_ALIGN to 2M
> > Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
> > - Update Documentation/devicetree/bindings/chosen.txt.
> > Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt
> > suggested by Arnd.
> > - Add Tested-by from Jhon and pk.
> > 
> > Changes since [v6]
> > - Fix build errors reported by kbuild test robot.
> > 
> > Changes since [v5]
> > - Move reserve_crashkernel_low() into kernel/crash_core.c.
> > - Delete crashkernel=X,high.
> > - Modify crashkernel=X,low.
> > If crashkernel=X,low is specified simultaneously, reserve spcified size low
> > memory for crash kdump kernel devices firstly and then reserve memory above 4G.
> > In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
> > pass to crash dump kernel by DT property "linux,low-memory-range".
> > - Update Documentation/admin-guide/kdump/kdump.rst.
> > 
> > Changes since [v4]
> > - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
> > 
> > Changes since [v3]
> > - Add memblock_cap_memory_ranges back for multiple ranges.
> > - Fix some compiling warnings.
> > 
> > Changes since [v2]
> > - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
> > two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
> > patch.
> > 
> > Changes since [v1]:
> > - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
> > - Remove memblock_cap_memory_ranges() i added in v1 and implement that
> > in fdt_enforce_memory_region().
> > There are at most two crash kernel regions, for two crash kernel regions
> > case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
> > and then remove the memory range in the middle.
> > 
> > v1:
> > There are following issues in arm64 kdump:
> > 1. We use crashkernel=X to reserve crashkernel below 4G, which
> > will fail when there is no enough low memory.
> > 2. If reserving crashkernel above 4G, in this case, crash dump
> > kernel will boot failure because there is no low memory available
> > for allocation.
> > 
> > To solve these issues, change the behavior of crashkernel=X.
> > crashkernel=X tries low allocation in DMA zone and fall back to high
> > allocation if it fails.
> > 
> > We can also use "crashkernel=X,high" to select a high region above
> > DMA zone, which also tries to allocate at least 256M low memory in
> > DMA zone automatically and "crashkernel=Y,low" can be used to allocate
> > specified size low memory.
> > 
> > When reserving crashkernel in high memory, some low memory is reserved
> > for crash dump kernel devices. So there may be two regions reserved for
> > crash dump kernel.
> > In order to distinct from the high region and make no effect to the use
> > of existing kexec-tools, rename the low region as "Crash kernel (low)",
> > and pass the low region by reusing DT property
> > "linux,usable-memory-range". We made the low memory region as the last
> > range of "linux,usable-memory-range" to keep compatibility with existing
> > user-space and older kdump kernels.
> > 
> > Besides, we need to modify kexec-tools:
> > arm64: support more than one crash kernel regions(see [1])
> > 
> > Another update is document about DT property 'linux,usable-memory-range':
> > schemas: update 'linux,usable-memory-range' node schema(see [2])
> > 
> > 
> > [1]: https://www.spinics.net/lists/kexec/msg28226.html
> > [2]: https://github.com/robherring/dt-schema/pull/19 
> > [v1]: https://lkml.org/lkml/2019/4/2/1174
> > [v2]: https://lkml.org/lkml/2019/4/9/86
> > [v3]: https://lkml.org/lkml/2019/4/9/306
> > [v4]: https://lkml.org/lkml/2019/4/15/273
> > [v5]: https://lkml.org/lkml/2019/5/6/1360
> > [v6]: https://lkml.org/lkml/2019/8/30/142
> > [v7]: https://lkml.org/lkml/2019/12/23/411
> > [v8]: https://lkml.org/lkml/2020/5/21/213
> > [v9]: https://lkml.org/lkml/2020/6/28/73
> > [v10]: https://lkml.org/lkml/2020/7/2/1443
> > [v11]: https://lkml.org/lkml/2020/8/1/150
> > [v12]: https://lkml.org/lkml/2020/9/7/1037
> > [v13]: https://lkml.org/lkml/2020/10/31/34
> > [v14]: https://lkml.org/lkml/2021/1/30/53
> > [v15]: https://lkml.org/lkml/2021/10/19/1405
> > [v16]: https://lkml.org/lkml/2021/11/23/435
> > [v17]: https://lkml.org/lkml/2021/12/10/38
> > [v18]: https://lkml.org/lkml/2021/12/22/424
> > [v19]: https://lkml.org/lkml/2021/12/28/203
> > [v20]: https://lkml.org/lkml/2022/1/24/167
> > [v21]: https://lkml.org/lkml/2022/2/26/350
> > 
> > Chen Zhou (2):
> >   arm64: kdump: Reimplement crashkernel=X
> >   of: fdt: Add memory for devices by DT property
> >     "linux,usable-memory-range"
> > 
> > Zhen Lei (7):
> >   kdump: return -ENOENT if required cmdline option does not exist
> >   arm64: Use insert_resource() to simplify code
> >   arm64: kdump: Remove some redundant checks in map_mem()
> >   arm64: kdump: Don't force page-level mappings for memory above 4G
> >   arm64: kdump: Use page-level mapping for the high memory of
> >     crashkernel
> >   arm64: kdump: Try not to use NO_BLOCK_MAPPINGS for memory under 4G
> >   docs: kdump: Update the crashkernel description for arm64
> > 
> >  .../admin-guide/kernel-parameters.txt         |   8 +-
> >  arch/arm64/include/asm/kexec.h                |   2 +
> >  arch/arm64/kernel/machine_kexec.c             |   9 +-
> >  arch/arm64/kernel/machine_kexec_file.c        |  12 +-
> >  arch/arm64/kernel/setup.c                     |  17 +-
> >  arch/arm64/mm/init.c                          | 204 +++++++++++++++++-
> >  arch/arm64/mm/mmu.c                           |  68 +++---
> >  drivers/of/fdt.c                              |  33 ++-
> >  kernel/crash_core.c                           |   3 +-
> >  9 files changed, 278 insertions(+), 78 deletions(-)
> > 
> 
> -- 
> Regards,
>   Zhen Lei
>
Baoquan He April 25, 2022, 3:49 a.m. UTC | #4
On 04/14/22 at 07:57pm, Zhen Lei wrote:
> According to the current crashkernel=Y,low support in other ARCHes, it's
> an optional command-line option. When it doesn't exist, kernel will try
> to allocate minimum required memory below 4G automatically.
> 
> However, __parse_crashkernel() returns '-EINVAL' for all error cases. It
> can't distinguish the nonexistent option from invalid option.
> 
> Change __parse_crashkernel() to return '-ENOENT' for the nonexistent option
> case. With this change, crashkernel,low memory will take the default
> value if crashkernel=,low is not specified; while crashkernel reservation
> will fail and bail out if an invalid option is specified.
> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>

Acked-by: Baoquan He <bhe@redhat.com>

> ---
>  kernel/crash_core.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 256cf6db573cd09..4d57c03714f4e13 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -243,9 +243,8 @@ static int __init __parse_crashkernel(char *cmdline,
>  	*crash_base = 0;
>  
>  	ck_cmdline = get_last_crashkernel(cmdline, name, suffix);
> -
>  	if (!ck_cmdline)
> -		return -EINVAL;
> +		return -ENOENT;
>  
>  	ck_cmdline += strlen(name);
>  
> -- 
> 2.25.1
>
Leizhen (ThunderTown) April 25, 2022, 6:29 a.m. UTC | #5
On 2022/4/25 10:45, Baoquan He wrote:
> On 04/25/22 at 10:19am, Leizhen (ThunderTown) wrote:
>>
>> Hi, Catalin Marinas and Will Deacon:
>>   Do you have time to review patches 3-4 and 6-7? These patches are a little more
>> relevant to the arm64 architecture.
>>
>> Hi, Baoquan He:
>>   Can you review patches 1 and 5 again?
> 
> Yes, and I will test and review the whole series if no one check the
> arm64 related changes.

Okay, thank you so much.

> 
>>
>>
Catalin Marinas April 26, 2022, 2:26 p.m. UTC | #6
On Thu, Apr 14, 2022 at 07:57:15PM +0800, Zhen Lei wrote:
> @@ -540,13 +540,31 @@ static void __init map_mem(pgd_t *pgdp)
>  	for_each_mem_range(i, &start, &end) {
>  		if (start >= end)
>  			break;
> +
> +#ifdef CONFIG_KEXEC_CORE
> +		if (eflags && (end >= SZ_4G)) {
> +			/*
> +			 * The memory block cross the 4G boundary.
> +			 * Forcibly use page-level mappings for memory under 4G.
> +			 */
> +			if (start < SZ_4G) {
> +				__map_memblock(pgdp, start, SZ_4G - 1,
> +					       pgprot_tagged(PAGE_KERNEL), flags | eflags);
> +				start  = SZ_4G;
> +			}
> +
> +			/* Page-level mappings is not mandatory for memory above 4G */
> +			eflags = 0;
> +		}
> +#endif

That's a bit tricky if a SoC has all RAM above 4G. IIRC AMD Seattle had
this layout. See max_zone_phys() for how we deal with this, basically
extending ZONE_DMA to the whole range if RAM starts above 4GB. In that
case, crashkernel reservation would fall in the range above 4GB.

BTW, we changed the max_zone_phys() logic with commit 791ab8b2e3db
("arm64: Ignore any DMA offsets in the max_zone_phys() calculation").
Catalin Marinas April 26, 2022, 6:02 p.m. UTC | #7
On Thu, Apr 14, 2022 at 07:57:16PM +0800, Zhen Lei wrote:
>  /*
>   * reserve_crashkernel() - reserves memory for crash kernel
>   *
>   * This function reserves memory area given in "crashkernel=" kernel command
>   * line parameter. The memory reserved is used by dump capture kernel when
>   * primary kernel is crashing.
> + *
> + * NOTE: Reservation of crashkernel,low is special since its existence
> + * is not independent, need rely on the existence of crashkernel,high.
> + * Here, four cases of crashkernel low memory reservation are summarized:
> + * 1) crashkernel=Y,low is specified explicitly, the size of crashkernel low
> + *    memory takes Y;
> + * 2) crashkernel=,low is not given, while crashkernel=,high is specified,
> + *    take the default crashkernel low memory size;
> + * 3) crashkernel=X is specified, while fallback to get a memory region
> + *    in high memory, take the default crashkernel low memory size;
> + * 4) crashkernel='invalid value',low is specified, failed the whole
> + *    crashkernel reservation and bail out.

Following the x86 behaviour made sense when we were tried to get that
code generic. Now that we moved the logic under arch/arm64, we can
diverge a bit. I lost track of the original (v1/v2) proposal but I
wonder whether we still need the fallback to high for crashkernel=Y.
Maybe simpler, no fallbacks:

	crashkernel=Y - keep the current behaviour, ignore high,low
	crashkernel=Y,high - allocate above ZONE_DMA
	crashkernel=Y,low - allocate within ZONE_DMA

From your proposal, the difference is that the Y,high option won't
have any default ZONE_DMA fallback, one would have to explicitly pass
the Y,low option if needed.

Just a thought, maybe it makes the code simpler. But I'm open to
discussion if there are good arguments for the proposed (x86-like)
behaviour. One argument could be for crashkernel=Y to fall back to high
if distros don't want to bother with high/low settings.

Another thing I may have asked in the past, what happens if we run a new
kernel with these patches with old kexec user tools. I suspect the
crashkernel=Y with the fallback to high will confuse the tools.

BTW, please separate the NO_BLOCK_MAPPINGS optimisations from the
crashkernel above 4G. Let's get the crashkernel reservations sorted
first, it's been around for too long.

Thanks.
Leizhen (ThunderTown) April 27, 2022, 6:54 a.m. UTC | #8
On 2022/4/27 2:02, Catalin Marinas wrote:
> On Thu, Apr 14, 2022 at 07:57:16PM +0800, Zhen Lei wrote:
>>  /*
>>   * reserve_crashkernel() - reserves memory for crash kernel
>>   *
>>   * This function reserves memory area given in "crashkernel=" kernel command
>>   * line parameter. The memory reserved is used by dump capture kernel when
>>   * primary kernel is crashing.
>> + *
>> + * NOTE: Reservation of crashkernel,low is special since its existence
>> + * is not independent, need rely on the existence of crashkernel,high.
>> + * Here, four cases of crashkernel low memory reservation are summarized:
>> + * 1) crashkernel=Y,low is specified explicitly, the size of crashkernel low
>> + *    memory takes Y;
>> + * 2) crashkernel=,low is not given, while crashkernel=,high is specified,
>> + *    take the default crashkernel low memory size;
>> + * 3) crashkernel=X is specified, while fallback to get a memory region
>> + *    in high memory, take the default crashkernel low memory size;
>> + * 4) crashkernel='invalid value',low is specified, failed the whole
>> + *    crashkernel reservation and bail out.
> 
> Following the x86 behaviour made sense when we were tried to get that
> code generic. Now that we moved the logic under arch/arm64, we can
> diverge a bit. I lost track of the original (v1/v2) proposal but I
> wonder whether we still need the fallback to high for crashkernel=Y.

I don't think anyone has raised this demand yet! If it weren't for the
fact that crashkernel=X appeared earlier, it would probably have been
enough for a combination of crashkernel=X,high and crashkernel=Y,low.

In fact, I also tend not to support "fallback to high for crashkernel=Y".
I took over this from Chen Zhou. In the absence of any objection, I had
to inherit. Now that you've brought it up, I'm happy to delete it.
Supporting this feature complicates the code logic a lot. The point is,
it's not fully backwards compatible yet. For example, someone may want
crashkernel=3G to report failure, but the the new support make it work.

> Maybe simpler, no fallbacks:
> 
> 	crashkernel=Y - keep the current behaviour, ignore high,low
> 	crashkernel=Y,high - allocate above ZONE_DMA
> 	crashkernel=Y,low - allocate within ZONE_DMA
> 
>>From your proposal, the difference is that the Y,high option won't
> have any default ZONE_DMA fallback, one would have to explicitly pass
> the Y,low option if needed.

I agree with you. Now we don't need code generic, so there is no need to
carry the historical burden of other ARCHs. arm64 does not need to delve
into that empirical value(the default size of crash low memory).

> 
> Just a thought, maybe it makes the code simpler. But I'm open to
> discussion if there are good arguments for the proposed (x86-like)
> behaviour. One argument could be for crashkernel=Y to fall back to high
> if distros don't want to bother with high/low settings.

I think distros should take precedence over "crashkernel=Y,high". After all,
ZONE_DMA memory is more valuable than high memory.


> 
> Another thing I may have asked in the past, what happens if we run a new
> kernel with these patches with old kexec user tools. I suspect the
> crashkernel=Y with the fallback to high will confuse the tools.

If crashkernel=Y can reserve the memory in Zone_DMA successfully, the old
kexec works well. But if crashkernel=Y fallback to high memory, the second
kernel will boot failed, because the old kexec can only use dtb to pass the
high memory range to the second kernel. In comparison, if no fallback, we can
see crash memory reservation failure in the first kernel, so we have a chance
to adjust Y.

Currently, the new kexec tool will pick the last memory range(sorted by address
in ascending order) to store Image,dtb,initrd.


> 
> BTW, please separate the NO_BLOCK_MAPPINGS optimisations from the
> crashkernel above 4G. Let's get the crashkernel reservations sorted
> first, it's been around for too long.

OK, thank you. That's a good suggestion.

> 
> Thanks.
>
Leizhen (ThunderTown) April 27, 2022, 7:12 a.m. UTC | #9
On 2022/4/26 22:26, Catalin Marinas wrote:
> On Thu, Apr 14, 2022 at 07:57:15PM +0800, Zhen Lei wrote:
>> @@ -540,13 +540,31 @@ static void __init map_mem(pgd_t *pgdp)
>>  	for_each_mem_range(i, &start, &end) {
>>  		if (start >= end)
>>  			break;
>> +
>> +#ifdef CONFIG_KEXEC_CORE
>> +		if (eflags && (end >= SZ_4G)) {
>> +			/*
>> +			 * The memory block cross the 4G boundary.
>> +			 * Forcibly use page-level mappings for memory under 4G.
>> +			 */
>> +			if (start < SZ_4G) {
>> +				__map_memblock(pgdp, start, SZ_4G - 1,
>> +					       pgprot_tagged(PAGE_KERNEL), flags | eflags);
>> +				start  = SZ_4G;
>> +			}
>> +
>> +			/* Page-level mappings is not mandatory for memory above 4G */
>> +			eflags = 0;
>> +		}
>> +#endif
> 
> That's a bit tricky if a SoC has all RAM above 4G. IIRC AMD Seattle had
> this layout. See max_zone_phys() for how we deal with this, basically
> extending ZONE_DMA to the whole range if RAM starts above 4GB. In that
> case, crashkernel reservation would fall in the range above 4GB.
> 
> BTW, we changed the max_zone_phys() logic with commit 791ab8b2e3db
> ("arm64: Ignore any DMA offsets in the max_zone_phys() calculation").

Okay, thanks for your correction. I'll dig into it after I've done the original requirement.

>
Catalin Marinas April 27, 2022, 12:32 p.m. UTC | #10
On Wed, Apr 27, 2022 at 02:54:52PM +0800, Leizhen (ThunderTown) wrote:
> On 2022/4/27 2:02, Catalin Marinas wrote:
> > On Thu, Apr 14, 2022 at 07:57:16PM +0800, Zhen Lei wrote:
> >>  /*
> >>   * reserve_crashkernel() - reserves memory for crash kernel
> >>   *
> >>   * This function reserves memory area given in "crashkernel=" kernel command
> >>   * line parameter. The memory reserved is used by dump capture kernel when
> >>   * primary kernel is crashing.
> >> + *
> >> + * NOTE: Reservation of crashkernel,low is special since its existence
> >> + * is not independent, need rely on the existence of crashkernel,high.
> >> + * Here, four cases of crashkernel low memory reservation are summarized:
> >> + * 1) crashkernel=Y,low is specified explicitly, the size of crashkernel low
> >> + *    memory takes Y;
> >> + * 2) crashkernel=,low is not given, while crashkernel=,high is specified,
> >> + *    take the default crashkernel low memory size;
> >> + * 3) crashkernel=X is specified, while fallback to get a memory region
> >> + *    in high memory, take the default crashkernel low memory size;
> >> + * 4) crashkernel='invalid value',low is specified, failed the whole
> >> + *    crashkernel reservation and bail out.
> > 
> > Following the x86 behaviour made sense when we were tried to get that
> > code generic. Now that we moved the logic under arch/arm64, we can
> > diverge a bit. I lost track of the original (v1/v2) proposal but I
> > wonder whether we still need the fallback to high for crashkernel=Y.
> 
> I don't think anyone has raised this demand yet! If it weren't for the
> fact that crashkernel=X appeared earlier, it would probably have been
> enough for a combination of crashkernel=X,high and crashkernel=Y,low.
> 
> In fact, I also tend not to support "fallback to high for crashkernel=Y".
> I took over this from Chen Zhou. In the absence of any objection, I had
> to inherit. Now that you've brought it up, I'm happy to delete it.
> Supporting this feature complicates the code logic a lot. The point is,
> it's not fully backwards compatible yet. For example, someone may want
> crashkernel=3G to report failure, but the the new support make it work.

BTW, prior to v20, this patch had this line:

	crashk_low_res.name = "Crash kernel (low)";

I can't find it anymore. Do the kexec tools need to distinguish between
low and high or they can cope with multiple "Crash kernel" entries?

> > Maybe simpler, no fallbacks:
> > 
> > 	crashkernel=Y - keep the current behaviour, ignore high,low
> > 	crashkernel=Y,high - allocate above ZONE_DMA
> > 	crashkernel=Y,low - allocate within ZONE_DMA
> > 
> > From your proposal, the difference is that the Y,high option won't
> > have any default ZONE_DMA fallback, one would have to explicitly pass
> > the Y,low option if needed.
> 
> I agree with you. Now we don't need code generic, so there is no need to
> carry the historical burden of other ARCHs. arm64 does not need to delve
> into that empirical value(the default size of crash low memory).
> 
> > Just a thought, maybe it makes the code simpler. But I'm open to
> > discussion if there are good arguments for the proposed (x86-like)
> > behaviour. One argument could be for crashkernel=Y to fall back to high
> > if distros don't want to bother with high/low settings.
> 
> I think distros should take precedence over "crashkernel=Y,high". After all,
> ZONE_DMA memory is more valuable than high memory.

My point is whether an admin configuring the kernel command line needs
to know the layout of ZONE_DMA etc. to figure out how much to pass in
high and low. The fallbacks in this case have some value but they also
complicate the code logic. The 4GB limit does not always make sense
either for some platforms (RPi4 has a ZONE_DMA limit of 1GB).

I think one could always pass a default command line like:

	crashkernel=1G,high crashkernel=128M,low

without much knowledge of the SoC memory layout.

Another option is to only introduce crashkernel=Y,low and, when that is
passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a
'high' option at all:

	crashkernel=1G				- all within ZONE_DMA
	crashkernel=1G crashkernel=128M,low	- 128M in ZONE_DMA
						  1G above ZONE_DMA

If ZONE_DMA is not present or it extends to the whole RAM, we can ignore
the 'low' option.
Leizhen (ThunderTown) April 27, 2022, 1:49 p.m. UTC | #11
On 2022/4/27 20:32, Catalin Marinas wrote:
> On Wed, Apr 27, 2022 at 02:54:52PM +0800, Leizhen (ThunderTown) wrote:
>> On 2022/4/27 2:02, Catalin Marinas wrote:
>>> On Thu, Apr 14, 2022 at 07:57:16PM +0800, Zhen Lei wrote:
>>>>  /*
>>>>   * reserve_crashkernel() - reserves memory for crash kernel
>>>>   *
>>>>   * This function reserves memory area given in "crashkernel=" kernel command
>>>>   * line parameter. The memory reserved is used by dump capture kernel when
>>>>   * primary kernel is crashing.
>>>> + *
>>>> + * NOTE: Reservation of crashkernel,low is special since its existence
>>>> + * is not independent, need rely on the existence of crashkernel,high.
>>>> + * Here, four cases of crashkernel low memory reservation are summarized:
>>>> + * 1) crashkernel=Y,low is specified explicitly, the size of crashkernel low
>>>> + *    memory takes Y;
>>>> + * 2) crashkernel=,low is not given, while crashkernel=,high is specified,
>>>> + *    take the default crashkernel low memory size;
>>>> + * 3) crashkernel=X is specified, while fallback to get a memory region
>>>> + *    in high memory, take the default crashkernel low memory size;
>>>> + * 4) crashkernel='invalid value',low is specified, failed the whole
>>>> + *    crashkernel reservation and bail out.
>>>
>>> Following the x86 behaviour made sense when we were tried to get that
>>> code generic. Now that we moved the logic under arch/arm64, we can
>>> diverge a bit. I lost track of the original (v1/v2) proposal but I
>>> wonder whether we still need the fallback to high for crashkernel=Y.
>>
>> I don't think anyone has raised this demand yet! If it weren't for the
>> fact that crashkernel=X appeared earlier, it would probably have been
>> enough for a combination of crashkernel=X,high and crashkernel=Y,low.
>>
>> In fact, I also tend not to support "fallback to high for crashkernel=Y".
>> I took over this from Chen Zhou. In the absence of any objection, I had
>> to inherit. Now that you've brought it up, I'm happy to delete it.
>> Supporting this feature complicates the code logic a lot. The point is,
>> it's not fully backwards compatible yet. For example, someone may want
>> crashkernel=3G to report failure, but the the new support make it work.
> 
> BTW, prior to v20, this patch had this line:
> 
> 	crashk_low_res.name = "Crash kernel (low)";
> 
> I can't find it anymore. Do the kexec tools need to distinguish between
> low and high or they can cope with multiple "Crash kernel" entries?

Yes, I've updated the kexec tool patch based on Borislav Petkov's comments
to keep it the same as x86. And this patch has been merged into kexec v2.0.24.
b5a34a2 arm64: support more than one crash kernel regions

The kexec tool will first sorted all "Crash kernel" memory range in ascending
order, then choose the last one (the memory range with the highest address) to
store the second kernel's Image,dtb,initrd.

> 
>>> Maybe simpler, no fallbacks:
>>>
>>> 	crashkernel=Y - keep the current behaviour, ignore high,low
>>> 	crashkernel=Y,high - allocate above ZONE_DMA
>>> 	crashkernel=Y,low - allocate within ZONE_DMA
>>>
>>> From your proposal, the difference is that the Y,high option won't
>>> have any default ZONE_DMA fallback, one would have to explicitly pass
>>> the Y,low option if needed.
>>
>> I agree with you. Now we don't need code generic, so there is no need to
>> carry the historical burden of other ARCHs. arm64 does not need to delve
>> into that empirical value(the default size of crash low memory).
>>
>>> Just a thought, maybe it makes the code simpler. But I'm open to
>>> discussion if there are good arguments for the proposed (x86-like)
>>> behaviour. One argument could be for crashkernel=Y to fall back to high
>>> if distros don't want to bother with high/low settings.
>>
>> I think distros should take precedence over "crashkernel=Y,high". After all,
>> ZONE_DMA memory is more valuable than high memory.
> 
> My point is whether an admin configuring the kernel command line needs
> to know the layout of ZONE_DMA etc. to figure out how much to pass in

No need.

> high and low. The fallbacks in this case have some value but they also
> complicate the code logic. The 4GB limit does not always make sense
> either for some platforms (RPi4 has a ZONE_DMA limit of 1GB).
> 
> I think one could always pass a default command line like:
> 
> 	crashkernel=1G,high crashkernel=128M,low
> 
> without much knowledge of the SoC memory layout.

Yes, that's what the end result is. The user specify crashkernel=128M,low
and the implementation ensure the 128M low memory is allocated from DMA zone.
We use arm64_dma_phys_limit as the upper limit for crash low memory.

+#define CRASH_ADDR_LOW_MAX             arm64_dma_phys_limit
+       unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
+       crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
                                               crash_base, crash_max);

> 
> Another option is to only introduce crashkernel=Y,low and, when that is
> passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a
> 'high' option at all:
> 
> 	crashkernel=1G				- all within ZONE_DMA
> 	crashkernel=1G crashkernel=128M,low	- 128M in ZONE_DMA
> 						  1G above ZONE_DMA
> 
> If ZONE_DMA is not present or it extends to the whole RAM, we can ignore
> the 'low' option.

I think although the code is hard to make generic, the interface is better to
be relatively uniform. A user might have to maintain both x86 and arm64, and
so on. It's not a good thing that the difference is too big.

A well had already been dug by the forefathers, and it seemed unnecessary to
dig another well beside it if there was no particular advantage.

>
Catalin Marinas April 27, 2022, 4:04 p.m. UTC | #12
On Wed, Apr 27, 2022 at 09:49:20PM +0800, Leizhen (ThunderTown) wrote:
> On 2022/4/27 20:32, Catalin Marinas wrote:
> > I think one could always pass a default command line like:
> > 
> > 	crashkernel=1G,high crashkernel=128M,low
> > 
> > without much knowledge of the SoC memory layout.
> 
> Yes, that's what the end result is. The user specify crashkernel=128M,low
> and the implementation ensure the 128M low memory is allocated from DMA zone.
> We use arm64_dma_phys_limit as the upper limit for crash low memory.
> 
> +#define CRASH_ADDR_LOW_MAX             arm64_dma_phys_limit
> +       unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
> +       crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
>                                                crash_base, crash_max);
> 
> > Another option is to only introduce crashkernel=Y,low and, when that is
> > passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a
> > 'high' option at all:
> > 
> > 	crashkernel=1G				- all within ZONE_DMA
> > 	crashkernel=1G crashkernel=128M,low	- 128M in ZONE_DMA
> > 						  1G above ZONE_DMA
> > 
> > If ZONE_DMA is not present or it extends to the whole RAM, we can ignore
> > the 'low' option.
> 
> I think although the code is hard to make generic, the interface is better to
> be relatively uniform. A user might have to maintain both x86 and arm64, and
> so on. It's not a good thing that the difference is too big.

There will be some difference as the 4G limit doesn't always hold for
arm64 (though it's true in most cases). Anyway, we can probably simplify
things a bit while following the documented behaviour:

	crashkernel=Y		- current behaviour within ZONE_DMA
	crashkernel=Y,high	- allocate from above ZONE_DMA
	crashkernel=Y,low	- allocate within ZONE_DMA

There is no fallback from crashkernel=Y.

The question is whether we still want a default low allocation if
crashkernel=Y,low is missing but 'high' is present. If we add this, I
think we'd be consistent with kernel-parameters.txt for the 'low'
description. A default 'low' is probably not that bad but I'm tempted to
always mandate both 'high' and 'low'.
Leizhen (ThunderTown) April 28, 2022, 2:22 a.m. UTC | #13
On 2022/4/28 0:04, Catalin Marinas wrote:
> On Wed, Apr 27, 2022 at 09:49:20PM +0800, Leizhen (ThunderTown) wrote:
>> On 2022/4/27 20:32, Catalin Marinas wrote:
>>> I think one could always pass a default command line like:
>>>
>>> 	crashkernel=1G,high crashkernel=128M,low
>>>
>>> without much knowledge of the SoC memory layout.
>>
>> Yes, that's what the end result is. The user specify crashkernel=128M,low
>> and the implementation ensure the 128M low memory is allocated from DMA zone.
>> We use arm64_dma_phys_limit as the upper limit for crash low memory.
>>
>> +#define CRASH_ADDR_LOW_MAX             arm64_dma_phys_limit
>> +       unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
>> +       crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
>>                                                crash_base, crash_max);
>>
>>> Another option is to only introduce crashkernel=Y,low and, when that is
>>> passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a
>>> 'high' option at all:
>>>
>>> 	crashkernel=1G				- all within ZONE_DMA
>>> 	crashkernel=1G crashkernel=128M,low	- 128M in ZONE_DMA
>>> 						  1G above ZONE_DMA
>>>
>>> If ZONE_DMA is not present or it extends to the whole RAM, we can ignore
>>> the 'low' option.
>>
>> I think although the code is hard to make generic, the interface is better to
>> be relatively uniform. A user might have to maintain both x86 and arm64, and
>> so on. It's not a good thing that the difference is too big.
> 
> There will be some difference as the 4G limit doesn't always hold for
> arm64 (though it's true in most cases). Anyway, we can probably simplify
> things a bit while following the documented behaviour:
> 
> 	crashkernel=Y		- current behaviour within ZONE_DMA
> 	crashkernel=Y,high	- allocate from above ZONE_DMA
> 	crashkernel=Y,low	- allocate within ZONE_DMA
> 
> There is no fallback from crashkernel=Y.

Yes, I followed your guidelines yesterday to modify the code. Now the code flow
is much clearer.

> 
> The question is whether we still want a default low allocation if
> crashkernel=Y,low is missing but 'high' is present. If we add this, I
> think we'd be consistent with kernel-parameters.txt for the 'low'
> description. A default 'low' is probably not that bad but I'm tempted to
> always mandate both 'high' and 'low'.

Yes, I agree with you. Because the situation is complicated, the default value
is hard to be accurate. It's better to let the user configure it according to
the actual situation, they're also programmers.

Whether mandate both 'high' and 'low', or allow only 'high' like x86(but the default
value becomes zero). I prefer the latter. The size of 'low' maybe zero, for example,
SMMU is enabled on the second kernel. If only high memory is required, only that
high memory needs to be configured, seems more reasonable.

>
Baoquan He April 28, 2022, 3:40 a.m. UTC | #14
Hi Catalin, Zhen Lei,

On 04/27/22 at 05:04pm, Catalin Marinas wrote:
> On Wed, Apr 27, 2022 at 09:49:20PM +0800, Leizhen (ThunderTown) wrote:
> > On 2022/4/27 20:32, Catalin Marinas wrote:
> > > I think one could always pass a default command line like:
> > > 
> > > 	crashkernel=1G,high crashkernel=128M,low
> > > 
> > > without much knowledge of the SoC memory layout.
> > 
> > Yes, that's what the end result is. The user specify crashkernel=128M,low
> > and the implementation ensure the 128M low memory is allocated from DMA zone.
> > We use arm64_dma_phys_limit as the upper limit for crash low memory.
> > 
> > +#define CRASH_ADDR_LOW_MAX             arm64_dma_phys_limit
> > +       unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
> > +       crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
> >                                                crash_base, crash_max);
> > 
> > > Another option is to only introduce crashkernel=Y,low and, when that is
> > > passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a
> > > 'high' option at all:
> > > 
> > > 	crashkernel=1G				- all within ZONE_DMA
> > > 	crashkernel=1G crashkernel=128M,low	- 128M in ZONE_DMA
> > > 						  1G above ZONE_DMA
> > > 
> > > If ZONE_DMA is not present or it extends to the whole RAM, we can ignore
> > > the 'low' option.
> > 
> > I think although the code is hard to make generic, the interface is better to
> > be relatively uniform. A user might have to maintain both x86 and arm64, and
> > so on. It's not a good thing that the difference is too big.
> 
> There will be some difference as the 4G limit doesn't always hold for
> arm64 (though it's true in most cases). Anyway, we can probably simplify
> things a bit while following the documented behaviour:
> 
> 	crashkernel=Y		- current behaviour within ZONE_DMA
> 	crashkernel=Y,high	- allocate from above ZONE_DMA
> 	crashkernel=Y,low	- allocate within ZONE_DMA
> 
> There is no fallback from crashkernel=Y.
> 
> The question is whether we still want a default low allocation if
> crashkernel=Y,low is missing but 'high' is present. If we add this, I
> think we'd be consistent with kernel-parameters.txt for the 'low'
> description. A default 'low' is probably not that bad but I'm tempted to
> always mandate both 'high' and 'low'.

Sorry to interrupt. Seems the ,high ,low and fallback are main concerns
about this version. And I have the same concerns about them which comes
from below points:
1) we may need to take best effort to keep ,high, ,low behaviour
consistent on all ARCHes. Otherwise user/admin may be confused when they
deploy/configure kdump on different machines of different ARCHes in the
same LAB. I think we should try to avoid the confusion.
2) Fallback behaviour is important to our distros. The reason is we will
provide default value with crashkernel=xxxM along kernel of distros. In
this case, we hope the reservation will succeed by all means. The ,high
and ,low is an option if customer likes to take with expertise.

After going through arm64 memory init code, I got below summary about
arm64_dma_phys_limit which is the first zone's upper limit. I think we
can make use of it to facilitate to simplify code.
================================================================================
                        DMA                      DMA32                    NORMAL
1)Raspberry Pi4         0~1G                     3G~4G                    (above 4G)
2)Normal machine        0~4G                     0                        (above 4G)
3)Special machine       (above 4G)~MAX
4)No DMA|DMA32                                                            (above 4G)~MAX

-------------------------------------------
                      arm64_dma_phys_limit
1)Raspberry Pi4         1G                     
2)Normal machine        4G                     
3)Special machine       MAX
4)No DMA|DMA32          MAX

Note: 3)Special machine means the machine's starting physical address is above 4G.
WHile 4)No DMA|DMA32 means kernel w/o CONFIG_ZONE_DMA|DMA32, and has
IOMMU hardware supporting.
===================================================================================

I made a draft patch based on this patchset, please feel free to check and
see if it's OK, or anything missing or wrongly understood. I removed
reserve_crashkernel_high() and only keep reserve_crashkernel() and
reserve_crashkernel_low() as the v21 did.

Thanks
Baoquan
Leizhen (ThunderTown) April 28, 2022, 9:33 a.m. UTC | #15
On 2022/4/28 11:52, Baoquan He wrote:
> On 04/28/22 at 11:40am, Baoquan He wrote:
>> Hi Catalin, Zhen Lei,
>>
>> On 04/27/22 at 05:04pm, Catalin Marinas wrote:
>>> On Wed, Apr 27, 2022 at 09:49:20PM +0800, Leizhen (ThunderTown) wrote:
>>>> On 2022/4/27 20:32, Catalin Marinas wrote:
>>>>> I think one could always pass a default command line like:
>>>>>
>>>>> 	crashkernel=1G,high crashkernel=128M,low
>>>>>
>>>>> without much knowledge of the SoC memory layout.
>>>>
>>>> Yes, that's what the end result is. The user specify crashkernel=128M,low
>>>> and the implementation ensure the 128M low memory is allocated from DMA zone.
>>>> We use arm64_dma_phys_limit as the upper limit for crash low memory.
>>>>
>>>> +#define CRASH_ADDR_LOW_MAX             arm64_dma_phys_limit
>>>> +       unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
>>>> +       crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
>>>>                                                crash_base, crash_max);
>>>>
>>>>> Another option is to only introduce crashkernel=Y,low and, when that is
>>>>> passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a
>>>>> 'high' option at all:
>>>>>
>>>>> 	crashkernel=1G				- all within ZONE_DMA
>>>>> 	crashkernel=1G crashkernel=128M,low	- 128M in ZONE_DMA
>>>>> 						  1G above ZONE_DMA
>>>>>
>>>>> If ZONE_DMA is not present or it extends to the whole RAM, we can ignore
>>>>> the 'low' option.
>>>>
>>>> I think although the code is hard to make generic, the interface is better to
>>>> be relatively uniform. A user might have to maintain both x86 and arm64, and
>>>> so on. It's not a good thing that the difference is too big.
>>>
>>> There will be some difference as the 4G limit doesn't always hold for
>>> arm64 (though it's true in most cases). Anyway, we can probably simplify
>>> things a bit while following the documented behaviour:
>>>
>>> 	crashkernel=Y		- current behaviour within ZONE_DMA
>>> 	crashkernel=Y,high	- allocate from above ZONE_DMA
>>> 	crashkernel=Y,low	- allocate within ZONE_DMA
>>>
>>> There is no fallback from crashkernel=Y.
>>>
>>> The question is whether we still want a default low allocation if
>>> crashkernel=Y,low is missing but 'high' is present. If we add this, I
>>> think we'd be consistent with kernel-parameters.txt for the 'low'
>>> description. A default 'low' is probably not that bad but I'm tempted to
>>> always mandate both 'high' and 'low'.
>>
>> Sorry to interrupt. Seems the ,high ,low and fallback are main concerns
>> about this version. And I have the same concerns about them which comes
>> from below points:
>> 1) we may need to take best effort to keep ,high, ,low behaviour
>> consistent on all ARCHes. Otherwise user/admin may be confused when they
>> deploy/configure kdump on different machines of different ARCHes in the
>> same LAB. I think we should try to avoid the confusion.

Yes, but for someone who is configuring crashkernel= for the first time, he
needs to read doc to understand how to configure it. The doc can show the
recommended default value of 'low' size.

After commit 94fb93341822 ("x86/crash: Allocate enough low memory when crashkernel=high"),
the default 'low' size doesn't make much sense anymore. The default size of swiotlb_size()
is 64M, far less than 256M. And if user specify "swiotlb=", he can also adjust crashkernel=Y,low.


+                * -swiotlb size: user-specified with swiotlb= or default.
-               low_size = swiotlb_size_or_default() + (8UL<<20);
+               low_size = max(swiotlb_size_or_default() + (8UL<<20), 256UL<<20);

That means all ARCHs can explicit configure crashkernel=256M,low, instead of
omitting it. This may be another way to avoid confusion. It's not hard for
programmer-turned-user/admin. However, this requires us to forgo backward
compatibility with the default size of 'low'.


>> 2) Fallback behaviour is important to our distros. The reason is we will
>> provide default value with crashkernel=xxxM along kernel of distros. In
>> this case, we hope the reservation will succeed by all means. The ,high
>> and ,low is an option if customer likes to take with expertise.

OK, I got it.

>>
>> After going through arm64 memory init code, I got below summary about
>> arm64_dma_phys_limit which is the first zone's upper limit. I think we
>> can make use of it to facilitate to simplify code.
>> ================================================================================
>>                         DMA                      DMA32                    NORMAL
>> 1)Raspberry Pi4         0~1G                     3G~4G                    (above 4G)
>> 2)Normal machine        0~4G                     0                        (above 4G)
>> 3)Special machine       (above 4G)~MAX
>> 4)No DMA|DMA32                                                            (above 4G)~MAX

arm64_memblock_init()
	reserve_crashkernel()        ---------------   0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into mem_init()")
paging_init()                                       |
	map_mem()                                   |
unflatten_device_tree or ACPI                       |  ----  //Raspberry Pi4 get dma zone base on dtb or ACPI
bootmem_init();                                     |      |
	zone_sizes_init()                           |      |
		of_dma_get_max_cpu_address          |  ----|
		//Update arm64_dma_phys_limit       |  ----|
	reserve_crashkernel()        <--------------  //Because we need arm64_dma_phys_limit to be updated above
request_standard_resources()

>>
>> -------------------------------------------
>>                       arm64_dma_phys_limit
>> 1)Raspberry Pi4         1G                     
>> 2)Normal machine        4G                     
>> 3)Special machine       MAX
>> 4)No DMA|DMA32          MAX
>>
>> Note: 3)Special machine means the machine's starting physical address is above 4G.
>> WHile 4)No DMA|DMA32 means kernel w/o CONFIG_ZONE_DMA|DMA32, and has
>> IOMMU hardware supporting.
>> ===================================================================================
>>
>> I made a draft patch based on this patchset, please feel free to check and
>> see if it's OK, or anything missing or wrongly understood. I removed
>> reserve_crashkernel_high() and only keep reserve_crashkernel() and
>> reserve_crashkernel_low() as the v21 did.
> 
> Sorry, forgot attaching the draft patch.
> 
> By the way, we can also have a simple version with basic ,high, ,low
> support, no fallback. We can add fallback and other optimization later.
> This can be plan B.

Yes, That's what Catalin suggested also.

Hi, Baoquan He:
  Without optimization, the whole Patch 3-4 and 6-7 can be dropped.

Process after abstraction:
	if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) {
		reserve_crashkernel()
		//block mapping
	} else {
		//page mapping
		reserve_crashkernel()
	}

------------ Simplified real-world process ---------
arm64_memblock_init()
	if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32))
		reserve_crashkernel()
paging_init()
	map_mem()
		if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32))
			//block mapping
		else
			//page mapping
unflatten_device_tree or ACPI
bootmem_init();
	zone_sizes_init()
		of_dma_get_max_cpu_address
		//Update arm64_dma_phys_limit
	if (IS_ENABLED(CONFIG_ZONE_DMA) || IS_ENABLED(CONFIG_ZONE_DMA32))
		reserve_crashkernel()


> 
>
Baoquan He April 29, 2022, 3:24 a.m. UTC | #16
On 04/28/22 at 05:33pm, Leizhen (ThunderTown) wrote:
> 
> 
> On 2022/4/28 11:52, Baoquan He wrote:
> > On 04/28/22 at 11:40am, Baoquan He wrote:
> >> Hi Catalin, Zhen Lei,
> >>
> >> On 04/27/22 at 05:04pm, Catalin Marinas wrote:
> >>> On Wed, Apr 27, 2022 at 09:49:20PM +0800, Leizhen (ThunderTown) wrote:
> >>>> On 2022/4/27 20:32, Catalin Marinas wrote:
> >>>>> I think one could always pass a default command line like:
> >>>>>
> >>>>> 	crashkernel=1G,high crashkernel=128M,low
> >>>>>
> >>>>> without much knowledge of the SoC memory layout.
> >>>>
> >>>> Yes, that's what the end result is. The user specify crashkernel=128M,low
> >>>> and the implementation ensure the 128M low memory is allocated from DMA zone.
> >>>> We use arm64_dma_phys_limit as the upper limit for crash low memory.
> >>>>
> >>>> +#define CRASH_ADDR_LOW_MAX             arm64_dma_phys_limit
> >>>> +       unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
> >>>> +       crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
> >>>>                                                crash_base, crash_max);
> >>>>
> >>>>> Another option is to only introduce crashkernel=Y,low and, when that is
> >>>>> passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a
> >>>>> 'high' option at all:
> >>>>>
> >>>>> 	crashkernel=1G				- all within ZONE_DMA
> >>>>> 	crashkernel=1G crashkernel=128M,low	- 128M in ZONE_DMA
> >>>>> 						  1G above ZONE_DMA
> >>>>>
> >>>>> If ZONE_DMA is not present or it extends to the whole RAM, we can ignore
> >>>>> the 'low' option.
> >>>>
> >>>> I think although the code is hard to make generic, the interface is better to
> >>>> be relatively uniform. A user might have to maintain both x86 and arm64, and
> >>>> so on. It's not a good thing that the difference is too big.
> >>>
> >>> There will be some difference as the 4G limit doesn't always hold for
> >>> arm64 (though it's true in most cases). Anyway, we can probably simplify
> >>> things a bit while following the documented behaviour:
> >>>
> >>> 	crashkernel=Y		- current behaviour within ZONE_DMA
> >>> 	crashkernel=Y,high	- allocate from above ZONE_DMA
> >>> 	crashkernel=Y,low	- allocate within ZONE_DMA
> >>>
> >>> There is no fallback from crashkernel=Y.
> >>>
> >>> The question is whether we still want a default low allocation if
> >>> crashkernel=Y,low is missing but 'high' is present. If we add this, I
> >>> think we'd be consistent with kernel-parameters.txt for the 'low'
> >>> description. A default 'low' is probably not that bad but I'm tempted to
> >>> always mandate both 'high' and 'low'.
> >>
> >> Sorry to interrupt. Seems the ,high ,low and fallback are main concerns
> >> about this version. And I have the same concerns about them which comes
> >> from below points:
> >> 1) we may need to take best effort to keep ,high, ,low behaviour
> >> consistent on all ARCHes. Otherwise user/admin may be confused when they
> >> deploy/configure kdump on different machines of different ARCHes in the
> >> same LAB. I think we should try to avoid the confusion.
> 
> Yes, but for someone who is configuring crashkernel= for the first time, he
> needs to read doc to understand how to configure it. The doc can show the
> recommended default value of 'low' size.
> 
> After commit 94fb93341822 ("x86/crash: Allocate enough low memory when crashkernel=high"),
> the default 'low' size doesn't make much sense anymore. The default size of swiotlb_size()
> is 64M, far less than 256M. And if user specify "swiotlb=", he can also adjust crashkernel=Y,low.
> 
> 
> +                * -swiotlb size: user-specified with swiotlb= or default.
> -               low_size = swiotlb_size_or_default() + (8UL<<20);
> +               low_size = max(swiotlb_size_or_default() + (8UL<<20), 256UL<<20);
> 
> That means all ARCHs can explicit configure crashkernel=256M,low, instead of
> omitting it. This may be another way to avoid confusion. It's not hard for
> programmer-turned-user/admin. However, this requires us to forgo backward
> compatibility with the default size of 'low'.

We can make ,high and ,low simpler at first as they are alternative. If
possible, we can also simplify the ,high ,low implementation on x86_64
if it truly brings better archievement on arm64.

> 
> 
> >> 2) Fallback behaviour is important to our distros. The reason is we will
> >> provide default value with crashkernel=xxxM along kernel of distros. In
> >> this case, we hope the reservation will succeed by all means. The ,high
> >> and ,low is an option if customer likes to take with expertise.
> 
> OK, I got it.
> 
> >>
> >> After going through arm64 memory init code, I got below summary about
> >> arm64_dma_phys_limit which is the first zone's upper limit. I think we
> >> can make use of it to facilitate to simplify code.
> >> ================================================================================
> >>                         DMA                      DMA32                    NORMAL
> >> 1)Raspberry Pi4         0~1G                     3G~4G                    (above 4G)
> >> 2)Normal machine        0~4G                     0                        (above 4G)
> >> 3)Special machine       (above 4G)~MAX
> >> 4)No DMA|DMA32                                                            (above 4G)~MAX
> 
> arm64_memblock_init()
> 	reserve_crashkernel()        ---------------   0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into mem_init()")
We don't need different code for this place of reservation as you are
doing in this patchset, since arm64_dma_phys_limit is initialized as 
below. In fact, in arm64_memblock_init(), we have made memblock ready,
we can initialize arm64_dma_phys_limit as memblock_end_of_DRAM(). And if
memblock_start_of_DRAM() is bigger than 4G, we possibly can call
reserve_crashkernel() here too.

phys_addr_t __ro_after_init arm64_dma_phys_limit = PHYS_MASK + 1;

> paging_init()                                       |
> 	map_mem()                                   |
> unflatten_device_tree or ACPI                       |  ----  //Raspberry Pi4 get dma zone base on dtb or ACPI
> bootmem_init();                                     |      |
> 	zone_sizes_init()                           |      |
> 		of_dma_get_max_cpu_address          |  ----|
> 		//Update arm64_dma_phys_limit       |  ----|
> 	reserve_crashkernel()        <--------------  //Because we need arm64_dma_phys_limit to be updated above
> request_standard_resources()

Yeah, because arm64_dma_phys_limit is decided late in the 1) and 2) case
as I summarized, we need defer reserve_crashkernel() to bootmem_init(). But 
arm64_dma_phys_limit could be 1G or 4G, that's why your optimization
about BLOCKING may not be right since you assume the 4G boundary, while
forgetting Raspberry Pi4 on which 1G is the boundary of low memory and
high memory. So separating out BLOCKING optimization can let us focus on
the crashkernel,high support.

> 
> >>
> >> -------------------------------------------
> >>                       arm64_dma_phys_limit
> >> 1)Raspberry Pi4         1G                     
> >> 2)Normal machine        4G                     
> >> 3)Special machine       MAX
> >> 4)No DMA|DMA32          MAX
> >>
> >> Note: 3)Special machine means the machine's starting physical address is above 4G.
> >> WHile 4)No DMA|DMA32 means kernel w/o CONFIG_ZONE_DMA|DMA32, and has
> >> IOMMU hardware supporting.
> >> ===================================================================================
> >>
> >> I made a draft patch based on this patchset, please feel free to check and
> >> see if it's OK, or anything missing or wrongly understood. I removed
> >> reserve_crashkernel_high() and only keep reserve_crashkernel() and
> >> reserve_crashkernel_low() as the v21 did.
> > 
> > Sorry, forgot attaching the draft patch.
> > 
> > By the way, we can also have a simple version with basic ,high, ,low
> > support, no fallback. We can add fallback and other optimization later.
> > This can be plan B.
> 
> Yes, That's what Catalin suggested also.
> 
> Hi, Baoquan He:
>   Without optimization, the whole Patch 3-4 and 6-7 can be dropped.
> 
> Process after abstraction:
> 	if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) {
> 		reserve_crashkernel()
> 		//block mapping
> 	} else {
> 		//page mapping
> 		reserve_crashkernel()
> 	}
> 
> ------------ Simplified real-world process ---------
Yeah, this looks clearer. I would like to see a version with them.

> arm64_memblock_init()
        Before reserve_crashkernel(), we can update arm64_dma_phys_limit
as memblock_end_of_DRAM() if CONFIG_ZONE_DMA|DMA32 is not enabled or
memblock_start_of_DRAM() is bigger than 4G.
        Then we go with:
        if (!arm64_dma_phys_limit)
		reserve_crashkernel();

Just personal opinion, please check if it's appropriate to handle case
3) which has physical starting memory above 4G here. 

> 	if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32))
> 		reserve_crashkernel()
           
> paging_init()
> 	map_mem()
> 		if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32))
> 			//block mapping
> 		else
> 			//page mapping
> unflatten_device_tree or ACPI
> bootmem_init();
> 	zone_sizes_init()
> 		of_dma_get_max_cpu_address
> 		//Update arm64_dma_phys_limit
> 	if (IS_ENABLED(CONFIG_ZONE_DMA) || IS_ENABLED(CONFIG_ZONE_DMA32))
> 		reserve_crashkernel()

The rest sounds good with optimization code split out.
Leizhen (ThunderTown) April 29, 2022, 8:02 a.m. UTC | #17
On 2022/4/29 11:24, Baoquan He wrote:
> On 04/28/22 at 05:33pm, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2022/4/28 11:52, Baoquan He wrote:
>>> On 04/28/22 at 11:40am, Baoquan He wrote:
>>>> Hi Catalin, Zhen Lei,
>>>>
>>>> On 04/27/22 at 05:04pm, Catalin Marinas wrote:
>>>>> On Wed, Apr 27, 2022 at 09:49:20PM +0800, Leizhen (ThunderTown) wrote:
>>>>>> On 2022/4/27 20:32, Catalin Marinas wrote:
>>>>>>> I think one could always pass a default command line like:
>>>>>>>
>>>>>>> 	crashkernel=1G,high crashkernel=128M,low
>>>>>>>
>>>>>>> without much knowledge of the SoC memory layout.
>>>>>>
>>>>>> Yes, that's what the end result is. The user specify crashkernel=128M,low
>>>>>> and the implementation ensure the 128M low memory is allocated from DMA zone.
>>>>>> We use arm64_dma_phys_limit as the upper limit for crash low memory.
>>>>>>
>>>>>> +#define CRASH_ADDR_LOW_MAX             arm64_dma_phys_limit
>>>>>> +       unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
>>>>>> +       crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
>>>>>>                                                crash_base, crash_max);
>>>>>>
>>>>>>> Another option is to only introduce crashkernel=Y,low and, when that is
>>>>>>> passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a
>>>>>>> 'high' option at all:
>>>>>>>
>>>>>>> 	crashkernel=1G				- all within ZONE_DMA
>>>>>>> 	crashkernel=1G crashkernel=128M,low	- 128M in ZONE_DMA
>>>>>>> 						  1G above ZONE_DMA
>>>>>>>
>>>>>>> If ZONE_DMA is not present or it extends to the whole RAM, we can ignore
>>>>>>> the 'low' option.
>>>>>>
>>>>>> I think although the code is hard to make generic, the interface is better to
>>>>>> be relatively uniform. A user might have to maintain both x86 and arm64, and
>>>>>> so on. It's not a good thing that the difference is too big.
>>>>>
>>>>> There will be some difference as the 4G limit doesn't always hold for
>>>>> arm64 (though it's true in most cases). Anyway, we can probably simplify
>>>>> things a bit while following the documented behaviour:
>>>>>
>>>>> 	crashkernel=Y		- current behaviour within ZONE_DMA
>>>>> 	crashkernel=Y,high	- allocate from above ZONE_DMA
>>>>> 	crashkernel=Y,low	- allocate within ZONE_DMA
>>>>>
>>>>> There is no fallback from crashkernel=Y.
>>>>>
>>>>> The question is whether we still want a default low allocation if
>>>>> crashkernel=Y,low is missing but 'high' is present. If we add this, I
>>>>> think we'd be consistent with kernel-parameters.txt for the 'low'
>>>>> description. A default 'low' is probably not that bad but I'm tempted to
>>>>> always mandate both 'high' and 'low'.
>>>>
>>>> Sorry to interrupt. Seems the ,high ,low and fallback are main concerns
>>>> about this version. And I have the same concerns about them which comes
>>>> from below points:
>>>> 1) we may need to take best effort to keep ,high, ,low behaviour
>>>> consistent on all ARCHes. Otherwise user/admin may be confused when they
>>>> deploy/configure kdump on different machines of different ARCHes in the
>>>> same LAB. I think we should try to avoid the confusion.
>>
>> Yes, but for someone who is configuring crashkernel= for the first time, he
>> needs to read doc to understand how to configure it. The doc can show the
>> recommended default value of 'low' size.
>>
>> After commit 94fb93341822 ("x86/crash: Allocate enough low memory when crashkernel=high"),
>> the default 'low' size doesn't make much sense anymore. The default size of swiotlb_size()
>> is 64M, far less than 256M. And if user specify "swiotlb=", he can also adjust crashkernel=Y,low.
>>
>>
>> +                * -swiotlb size: user-specified with swiotlb= or default.
>> -               low_size = swiotlb_size_or_default() + (8UL<<20);
>> +               low_size = max(swiotlb_size_or_default() + (8UL<<20), 256UL<<20);
>>
>> That means all ARCHs can explicit configure crashkernel=256M,low, instead of
>> omitting it. This may be another way to avoid confusion. It's not hard for
>> programmer-turned-user/admin. However, this requires us to forgo backward
>> compatibility with the default size of 'low'.
> 
> We can make ,high and ,low simpler at first as they are alternative. If
> possible, we can also simplify the ,high ,low implementation on x86_64
> if it truly brings better archievement on arm64.

OK, I plan to remove optimization, fallback and default low size, to follow the
suggestion of Catalin first. But there's one minor point of contention.

1)    Both "crashkernel=X,high" and "crashkernel=X,low" must be present.
2)    Both "crashkernel=X,high" and "crashkernel=X,low" are present.
   or
      Allow "crashkernel=X,high" to be present alone. Unlike x86, the default low size is zero.

I prefer 2), how about you?

> 
>>
>>
>>>> 2) Fallback behaviour is important to our distros. The reason is we will
>>>> provide default value with crashkernel=xxxM along kernel of distros. In
>>>> this case, we hope the reservation will succeed by all means. The ,high
>>>> and ,low is an option if customer likes to take with expertise.
>>
>> OK, I got it.
>>
>>>>
>>>> After going through arm64 memory init code, I got below summary about
>>>> arm64_dma_phys_limit which is the first zone's upper limit. I think we
>>>> can make use of it to facilitate to simplify code.
>>>> ================================================================================
>>>>                         DMA                      DMA32                    NORMAL
>>>> 1)Raspberry Pi4         0~1G                     3G~4G                    (above 4G)
>>>> 2)Normal machine        0~4G                     0                        (above 4G)
>>>> 3)Special machine       (above 4G)~MAX
>>>> 4)No DMA|DMA32                                                            (above 4G)~MAX
>>
>> arm64_memblock_init()
>> 	reserve_crashkernel()        ---------------   0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into mem_init()")
> We don't need different code for this place of reservation as you are
> doing in this patchset, since arm64_dma_phys_limit is initialized as 
> below. In fact, in arm64_memblock_init(), we have made memblock ready,
> we can initialize arm64_dma_phys_limit as memblock_end_of_DRAM(). And if
> memblock_start_of_DRAM() is bigger than 4G, we possibly can call
> reserve_crashkernel() here too.

Yes. Maybe all the devices in this environment are 64-bit. One way I know of allowing
32-bit devices to access high memory without SMMU is: Set a fixed value for the upper
32 bits. In this case, the DMA zone should be [phys_start, phys_start + 4G).

> 
> phys_addr_t __ro_after_init arm64_dma_phys_limit = PHYS_MASK + 1;
> 
>> paging_init()                                       |
>> 	map_mem()                                   |
>> unflatten_device_tree or ACPI                       |  ----  //Raspberry Pi4 get dma zone base on dtb or ACPI
>> bootmem_init();                                     |      |
>> 	zone_sizes_init()                           |      |
>> 		of_dma_get_max_cpu_address          |  ----|
>> 		//Update arm64_dma_phys_limit       |  ----|
>> 	reserve_crashkernel()        <--------------  //Because we need arm64_dma_phys_limit to be updated above
>> request_standard_resources()
> 
> Yeah, because arm64_dma_phys_limit is decided late in the 1) and 2) case
> as I summarized, we need defer reserve_crashkernel() to bootmem_init(). But 
> arm64_dma_phys_limit could be 1G or 4G, that's why your optimization
> about BLOCKING may not be right since you assume the 4G boundary, while
> forgetting Raspberry Pi4 on which 1G is the boundary of low memory and

No, no, my optimization for Raspberry Pi4 is fine. I do page mapping for memory
under 4G and block mapping for memory above 4G. But still try to reserve crash
low memory from DMA zone. So when 1G is the boundary, it's just not fully optimized,
the memory 1-4G are mapped as page.

> high memory. So separating out BLOCKING optimization can let us focus on
> the crashkernel,high support.
> 
>>
>>>>
>>>> -------------------------------------------
>>>>                       arm64_dma_phys_limit
>>>> 1)Raspberry Pi4         1G                     
>>>> 2)Normal machine        4G                     
>>>> 3)Special machine       MAX
>>>> 4)No DMA|DMA32          MAX
>>>>
>>>> Note: 3)Special machine means the machine's starting physical address is above 4G.
>>>> WHile 4)No DMA|DMA32 means kernel w/o CONFIG_ZONE_DMA|DMA32, and has
>>>> IOMMU hardware supporting.
>>>> ===================================================================================
>>>>
>>>> I made a draft patch based on this patchset, please feel free to check and
>>>> see if it's OK, or anything missing or wrongly understood. I removed
>>>> reserve_crashkernel_high() and only keep reserve_crashkernel() and
>>>> reserve_crashkernel_low() as the v21 did.
>>>
>>> Sorry, forgot attaching the draft patch.
>>>
>>> By the way, we can also have a simple version with basic ,high, ,low
>>> support, no fallback. We can add fallback and other optimization later.
>>> This can be plan B.
>>
>> Yes, That's what Catalin suggested also.
>>
>> Hi, Baoquan He:
>>   Without optimization, the whole Patch 3-4 and 6-7 can be dropped.
>>
>> Process after abstraction:
>> 	if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) {
>> 		reserve_crashkernel()
>> 		//block mapping
>> 	} else {
>> 		//page mapping
>> 		reserve_crashkernel()
>> 	}
>>
>> ------------ Simplified real-world process ---------
> Yeah, this looks clearer. I would like to see a version with them.
> 
>> arm64_memblock_init()
>         Before reserve_crashkernel(), we can update arm64_dma_phys_limit
> as memblock_end_of_DRAM() if CONFIG_ZONE_DMA|DMA32 is not enabled or
> memblock_start_of_DRAM() is bigger than 4G.
>         Then we go with:
>         if (!arm64_dma_phys_limit)
> 		reserve_crashkernel();
> 
> Just personal opinion, please check if it's appropriate to handle case
> 3) which has physical starting memory above 4G here. 

OK, I'll write it down and consider it in the future optimization.

> 
>> 	if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32))
>> 		reserve_crashkernel()
>            
>> paging_init()
>> 	map_mem()
>> 		if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32))
>> 			//block mapping
>> 		else
>> 			//page mapping
>> unflatten_device_tree or ACPI
>> bootmem_init();
>> 	zone_sizes_init()
>> 		of_dma_get_max_cpu_address
>> 		//Update arm64_dma_phys_limit
>> 	if (IS_ENABLED(CONFIG_ZONE_DMA) || IS_ENABLED(CONFIG_ZONE_DMA32))
>> 		reserve_crashkernel()
> 
> The rest sounds good with optimization code split out.
> 
> .
>
Leizhen (ThunderTown) April 29, 2022, 8:25 a.m. UTC | #18
On 2022/4/29 16:02, Leizhen (ThunderTown) wrote:
> 
> 
> On 2022/4/29 11:24, Baoquan He wrote:
>> On 04/28/22 at 05:33pm, Leizhen (ThunderTown) wrote:
>>>
>>>
>>> On 2022/4/28 11:52, Baoquan He wrote:
>>>> On 04/28/22 at 11:40am, Baoquan He wrote:
>>>>> Hi Catalin, Zhen Lei,
>>>>>
>>>>> On 04/27/22 at 05:04pm, Catalin Marinas wrote:
>>>>>> On Wed, Apr 27, 2022 at 09:49:20PM +0800, Leizhen (ThunderTown) wrote:
>>>>>>> On 2022/4/27 20:32, Catalin Marinas wrote:
>>>>>>>> I think one could always pass a default command line like:
>>>>>>>>
>>>>>>>> 	crashkernel=1G,high crashkernel=128M,low
>>>>>>>>
>>>>>>>> without much knowledge of the SoC memory layout.
>>>>>>>
>>>>>>> Yes, that's what the end result is. The user specify crashkernel=128M,low
>>>>>>> and the implementation ensure the 128M low memory is allocated from DMA zone.
>>>>>>> We use arm64_dma_phys_limit as the upper limit for crash low memory.
>>>>>>>
>>>>>>> +#define CRASH_ADDR_LOW_MAX             arm64_dma_phys_limit
>>>>>>> +       unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
>>>>>>> +       crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
>>>>>>>                                                crash_base, crash_max);
>>>>>>>
>>>>>>>> Another option is to only introduce crashkernel=Y,low and, when that is
>>>>>>>> passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a
>>>>>>>> 'high' option at all:
>>>>>>>>
>>>>>>>> 	crashkernel=1G				- all within ZONE_DMA
>>>>>>>> 	crashkernel=1G crashkernel=128M,low	- 128M in ZONE_DMA
>>>>>>>> 						  1G above ZONE_DMA
>>>>>>>>
>>>>>>>> If ZONE_DMA is not present or it extends to the whole RAM, we can ignore
>>>>>>>> the 'low' option.
>>>>>>>
>>>>>>> I think although the code is hard to make generic, the interface is better to
>>>>>>> be relatively uniform. A user might have to maintain both x86 and arm64, and
>>>>>>> so on. It's not a good thing that the difference is too big.
>>>>>>
>>>>>> There will be some difference as the 4G limit doesn't always hold for
>>>>>> arm64 (though it's true in most cases). Anyway, we can probably simplify
>>>>>> things a bit while following the documented behaviour:
>>>>>>
>>>>>> 	crashkernel=Y		- current behaviour within ZONE_DMA
>>>>>> 	crashkernel=Y,high	- allocate from above ZONE_DMA
>>>>>> 	crashkernel=Y,low	- allocate within ZONE_DMA
>>>>>>
>>>>>> There is no fallback from crashkernel=Y.
>>>>>>
>>>>>> The question is whether we still want a default low allocation if
>>>>>> crashkernel=Y,low is missing but 'high' is present. If we add this, I
>>>>>> think we'd be consistent with kernel-parameters.txt for the 'low'
>>>>>> description. A default 'low' is probably not that bad but I'm tempted to
>>>>>> always mandate both 'high' and 'low'.
>>>>>
>>>>> Sorry to interrupt. Seems the ,high ,low and fallback are main concerns
>>>>> about this version. And I have the same concerns about them which comes
>>>>> from below points:
>>>>> 1) we may need to take best effort to keep ,high, ,low behaviour
>>>>> consistent on all ARCHes. Otherwise user/admin may be confused when they
>>>>> deploy/configure kdump on different machines of different ARCHes in the
>>>>> same LAB. I think we should try to avoid the confusion.
>>>
>>> Yes, but for someone who is configuring crashkernel= for the first time, he
>>> needs to read doc to understand how to configure it. The doc can show the
>>> recommended default value of 'low' size.
>>>
>>> After commit 94fb93341822 ("x86/crash: Allocate enough low memory when crashkernel=high"),
>>> the default 'low' size doesn't make much sense anymore. The default size of swiotlb_size()
>>> is 64M, far less than 256M. And if user specify "swiotlb=", he can also adjust crashkernel=Y,low.
>>>
>>>
>>> +                * -swiotlb size: user-specified with swiotlb= or default.
>>> -               low_size = swiotlb_size_or_default() + (8UL<<20);
>>> +               low_size = max(swiotlb_size_or_default() + (8UL<<20), 256UL<<20);
>>>
>>> That means all ARCHs can explicit configure crashkernel=256M,low, instead of
>>> omitting it. This may be another way to avoid confusion. It's not hard for
>>> programmer-turned-user/admin. However, this requires us to forgo backward
>>> compatibility with the default size of 'low'.
>>
>> We can make ,high and ,low simpler at first as they are alternative. If
>> possible, we can also simplify the ,high ,low implementation on x86_64
>> if it truly brings better archievement on arm64.
> 
> OK, I plan to remove optimization, fallback and default low size, to follow the
> suggestion of Catalin first. But there's one minor point of contention.
> 
> 1)    Both "crashkernel=X,high" and "crashkernel=X,low" must be present.
> 2)    Both "crashkernel=X,high" and "crashkernel=X,low" are present.
>    or
>       Allow "crashkernel=X,high" to be present alone. Unlike x86, the default low size is zero.
> 
> I prefer 2), how about you?
> 
>>
>>>
>>>
>>>>> 2) Fallback behaviour is important to our distros. The reason is we will
>>>>> provide default value with crashkernel=xxxM along kernel of distros. In
>>>>> this case, we hope the reservation will succeed by all means. The ,high
>>>>> and ,low is an option if customer likes to take with expertise.
>>>
>>> OK, I got it.
>>>
>>>>>
>>>>> After going through arm64 memory init code, I got below summary about
>>>>> arm64_dma_phys_limit which is the first zone's upper limit. I think we
>>>>> can make use of it to facilitate to simplify code.
>>>>> ================================================================================
>>>>>                         DMA                      DMA32                    NORMAL
>>>>> 1)Raspberry Pi4         0~1G                     3G~4G                    (above 4G)
>>>>> 2)Normal machine        0~4G                     0                        (above 4G)
>>>>> 3)Special machine       (above 4G)~MAX
>>>>> 4)No DMA|DMA32                                                            (above 4G)~MAX
>>>
>>> arm64_memblock_init()
>>> 	reserve_crashkernel()        ---------------   0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into mem_init()")
>> We don't need different code for this place of reservation as you are
>> doing in this patchset, since arm64_dma_phys_limit is initialized as 
>> below. In fact, in arm64_memblock_init(), we have made memblock ready,
>> we can initialize arm64_dma_phys_limit as memblock_end_of_DRAM(). And if
>> memblock_start_of_DRAM() is bigger than 4G, we possibly can call
>> reserve_crashkernel() here too.
> 
> Yes. Maybe all the devices in this environment are 64-bit. One way I know of allowing
> 32-bit devices to access high memory without SMMU is: Set a fixed value for the upper
> 32 bits. In this case, the DMA zone should be [phys_start, phys_start + 4G).

I just read the message of commit 791ab8b2e3 ("arm64: Ignore any DMA offsets in the max_zone_phys() calculation")

    Currently, the kernel assumes that if RAM starts above 32-bit (or
    zone_bits), there is still a ZONE_DMA/DMA32 at the bottom of the RAM and
    such constrained devices have a hardwired DMA offset. In practice, we
    haven't noticed any such hardware so let's assume that we can expand
    ZONE_DMA32 to the available memory if no RAM below 4GB. Similarly,
    ZONE_DMA is expanded to the 4GB limit if no RAM addressable by
    zone_bits.

So your suggestion is feasible.

> 
>>
>> phys_addr_t __ro_after_init arm64_dma_phys_limit = PHYS_MASK + 1;
>>
>>> paging_init()                                       |
>>> 	map_mem()                                   |
>>> unflatten_device_tree or ACPI                       |  ----  //Raspberry Pi4 get dma zone base on dtb or ACPI
>>> bootmem_init();                                     |      |
>>> 	zone_sizes_init()                           |      |
>>> 		of_dma_get_max_cpu_address          |  ----|
>>> 		//Update arm64_dma_phys_limit       |  ----|
>>> 	reserve_crashkernel()        <--------------  //Because we need arm64_dma_phys_limit to be updated above
>>> request_standard_resources()
>>
>> Yeah, because arm64_dma_phys_limit is decided late in the 1) and 2) case
>> as I summarized, we need defer reserve_crashkernel() to bootmem_init(). But 
>> arm64_dma_phys_limit could be 1G or 4G, that's why your optimization
>> about BLOCKING may not be right since you assume the 4G boundary, while
>> forgetting Raspberry Pi4 on which 1G is the boundary of low memory and
> 
> No, no, my optimization for Raspberry Pi4 is fine. I do page mapping for memory
> under 4G and block mapping for memory above 4G. But still try to reserve crash
> low memory from DMA zone. So when 1G is the boundary, it's just not fully optimized,
> the memory 1-4G are mapped as page.
> 
>> high memory. So separating out BLOCKING optimization can let us focus on
>> the crashkernel,high support.
>>
>>>
>>>>>
>>>>> -------------------------------------------
>>>>>                       arm64_dma_phys_limit
>>>>> 1)Raspberry Pi4         1G                     
>>>>> 2)Normal machine        4G                     
>>>>> 3)Special machine       MAX
>>>>> 4)No DMA|DMA32          MAX
>>>>>
>>>>> Note: 3)Special machine means the machine's starting physical address is above 4G.
>>>>> WHile 4)No DMA|DMA32 means kernel w/o CONFIG_ZONE_DMA|DMA32, and has
>>>>> IOMMU hardware supporting.
>>>>> ===================================================================================
>>>>>
>>>>> I made a draft patch based on this patchset, please feel free to check and
>>>>> see if it's OK, or anything missing or wrongly understood. I removed
>>>>> reserve_crashkernel_high() and only keep reserve_crashkernel() and
>>>>> reserve_crashkernel_low() as the v21 did.
>>>>
>>>> Sorry, forgot attaching the draft patch.
>>>>
>>>> By the way, we can also have a simple version with basic ,high, ,low
>>>> support, no fallback. We can add fallback and other optimization later.
>>>> This can be plan B.
>>>
>>> Yes, That's what Catalin suggested also.
>>>
>>> Hi, Baoquan He:
>>>   Without optimization, the whole Patch 3-4 and 6-7 can be dropped.
>>>
>>> Process after abstraction:
>>> 	if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) {
>>> 		reserve_crashkernel()
>>> 		//block mapping
>>> 	} else {
>>> 		//page mapping
>>> 		reserve_crashkernel()
>>> 	}
>>>
>>> ------------ Simplified real-world process ---------
>> Yeah, this looks clearer. I would like to see a version with them.
>>
>>> arm64_memblock_init()
>>         Before reserve_crashkernel(), we can update arm64_dma_phys_limit
>> as memblock_end_of_DRAM() if CONFIG_ZONE_DMA|DMA32 is not enabled or
>> memblock_start_of_DRAM() is bigger than 4G.
>>         Then we go with:
>>         if (!arm64_dma_phys_limit)
>> 		reserve_crashkernel();
>>
>> Just personal opinion, please check if it's appropriate to handle case
>> 3) which has physical starting memory above 4G here. 
> 
> OK, I'll write it down and consider it in the future optimization.
> 
>>
>>> 	if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32))
>>> 		reserve_crashkernel()
>>            
>>> paging_init()
>>> 	map_mem()
>>> 		if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32))
>>> 			//block mapping
>>> 		else
>>> 			//page mapping
>>> unflatten_device_tree or ACPI
>>> bootmem_init();
>>> 	zone_sizes_init()
>>> 		of_dma_get_max_cpu_address
>>> 		//Update arm64_dma_phys_limit
>>> 	if (IS_ENABLED(CONFIG_ZONE_DMA) || IS_ENABLED(CONFIG_ZONE_DMA32))
>>> 		reserve_crashkernel()
>>
>> The rest sounds good with optimization code split out.
>>
>> .
>>
>
Catalin Marinas May 3, 2022, 10 p.m. UTC | #19
On Fri, Apr 29, 2022 at 04:25:37PM +0800, Leizhen (ThunderTown) wrote:
> On 2022/4/29 16:02, Leizhen (ThunderTown) wrote:
> > On 2022/4/29 11:24, Baoquan He wrote:
> >> On 04/28/22 at 05:33pm, Leizhen (ThunderTown) wrote:
> >>> On 2022/4/28 11:52, Baoquan He wrote:
> >>>> On 04/28/22 at 11:40am, Baoquan He wrote:
> >>>>> On 04/27/22 at 05:04pm, Catalin Marinas wrote:
> >>>>>> There will be some difference as the 4G limit doesn't always hold for
> >>>>>> arm64 (though it's true in most cases). Anyway, we can probably simplify
> >>>>>> things a bit while following the documented behaviour:
> >>>>>>
> >>>>>> 	crashkernel=Y		- current behaviour within ZONE_DMA
> >>>>>> 	crashkernel=Y,high	- allocate from above ZONE_DMA
> >>>>>> 	crashkernel=Y,low	- allocate within ZONE_DMA
[...]
> >>>>> Sorry to interrupt. Seems the ,high ,low and fallback are main concerns
> >>>>> about this version. And I have the same concerns about them which comes
> >>>>> from below points:
> >>>>> 1) we may need to take best effort to keep ,high, ,low behaviour
> >>>>> consistent on all ARCHes. Otherwise user/admin may be confused when they
> >>>>> deploy/configure kdump on different machines of different ARCHes in the
> >>>>> same LAB. I think we should try to avoid the confusion.

I guess by all arches you mean just x86 here. Since the code is not
generic, all arches do their own stuff.

> > OK, I plan to remove optimization, fallback and default low size, to follow the
> > suggestion of Catalin first. But there's one minor point of contention.
> > 
> > 1)    Both "crashkernel=X,high" and "crashkernel=X,low" must be present.
> > 2)    Both "crashkernel=X,high" and "crashkernel=X,low" are present.
> >    or
> >       Allow "crashkernel=X,high" to be present alone. Unlike x86, the default low size is zero.
> > 
> > I prefer 2), how about you?

(2) works for me as well. We keep these simple as "expert" options and
allow crashkernel= to fall back to 'high' if not sufficient memory in
ZONE_DMA. That would be a slight change from the current behaviour but,
as Zhen Lei said, with the old tools it's just moving the error around,
the crashkernel wouldn't be available in either case.

> >>>>> 2) Fallback behaviour is important to our distros. The reason is we will
> >>>>> provide default value with crashkernel=xxxM along kernel of distros. In
> >>>>> this case, we hope the reservation will succeed by all means. The ,high
> >>>>> and ,low is an option if customer likes to take with expertise.

OK, that's good feedback.

So, to recap, IIUC you are fine with:

	crashkernel=Y		- allocate within ZONE_DMA with fallback
				  above with a default in ZONE_DMA (like
				  x86, 256M or swiotlb size)
	crashkernel=Y,high	- allocate from above ZONE_DMA
	crashkernel=Y,low	- allocate within ZONE_DMA

'crashkernel' overrides the high and low while the latter two can be
passed independently.

> >>>>> After going through arm64 memory init code, I got below summary about
> >>>>> arm64_dma_phys_limit which is the first zone's upper limit. I think we
> >>>>> can make use of it to facilitate to simplify code.
> >>>>> ================================================================================
> >>>>>                         DMA                      DMA32                    NORMAL
> >>>>> 1)Raspberry Pi4         0~1G                     3G~4G                    (above 4G)
> >>>>> 2)Normal machine        0~4G                     0                        (above 4G)
> >>>>> 3)Special machine       (above 4G)~MAX
> >>>>> 4)No DMA|DMA32                                                            (above 4G)~MAX
> >>>
> >>> arm64_memblock_init()
> >>> 	reserve_crashkernel()        ---------------   0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into mem_init()")
> >> We don't need different code for this place of reservation as you are
> >> doing in this patchset, since arm64_dma_phys_limit is initialized as 
> >> below. In fact, in arm64_memblock_init(), we have made memblock ready,
> >> we can initialize arm64_dma_phys_limit as memblock_end_of_DRAM(). And if
> >> memblock_start_of_DRAM() is bigger than 4G, we possibly can call
> >> reserve_crashkernel() here too.
> > 
> > Yes. Maybe all the devices in this environment are 64-bit. One way I
> > know of allowing 32-bit devices to access high memory without SMMU
> > is: Set a fixed value for the upper 32 bits. In this case, the DMA
> > zone should be [phys_start, phys_start + 4G).

We decided that this case doesn't really exists for arm64 platforms (no
need for special ZONE_DMA).

> I just read the message of commit 791ab8b2e3 ("arm64: Ignore any DMA
> offsets in the max_zone_phys() calculation")
> 
>     Currently, the kernel assumes that if RAM starts above 32-bit (or
>     zone_bits), there is still a ZONE_DMA/DMA32 at the bottom of the RAM and
>     such constrained devices have a hardwired DMA offset. In practice, we
>     haven't noticed any such hardware so let's assume that we can expand
>     ZONE_DMA32 to the available memory if no RAM below 4GB. Similarly,
>     ZONE_DMA is expanded to the 4GB limit if no RAM addressable by
>     zone_bits.

I think the above log is slightly confusing. If the DRAM starts above
4G, ZONE_DMA goes to the end of DRAM. If the DRAM starts below 4G but
above the zone_bits for ZONE_DMA as specified in DT/ACPI, it pushes
ZONE_DMA to 4G. I don't remember why we did this last part, maybe in
case we get incorrect firmware tables, otherwise we could have extended
ZONE_DMA to end of DRAM.

Zhen Lei, if we agreed on the crashkernel behaviour, could you please
post a series that does the above parsing allocation? Ignore the
optimisations, we can look at them afterwards.

Thanks.
Leizhen (ThunderTown) May 5, 2022, 2:13 a.m. UTC | #20
On 2022/5/4 6:00, Catalin Marinas wrote:
> On Fri, Apr 29, 2022 at 04:25:37PM +0800, Leizhen (ThunderTown) wrote:
>> On 2022/4/29 16:02, Leizhen (ThunderTown) wrote:
>>> On 2022/4/29 11:24, Baoquan He wrote:
>>>> On 04/28/22 at 05:33pm, Leizhen (ThunderTown) wrote:
>>>>> On 2022/4/28 11:52, Baoquan He wrote:
>>>>>> On 04/28/22 at 11:40am, Baoquan He wrote:
>>>>>>> On 04/27/22 at 05:04pm, Catalin Marinas wrote:
>>>>>>>> There will be some difference as the 4G limit doesn't always hold for
>>>>>>>> arm64 (though it's true in most cases). Anyway, we can probably simplify
>>>>>>>> things a bit while following the documented behaviour:
>>>>>>>>
>>>>>>>> 	crashkernel=Y		- current behaviour within ZONE_DMA
>>>>>>>> 	crashkernel=Y,high	- allocate from above ZONE_DMA
>>>>>>>> 	crashkernel=Y,low	- allocate within ZONE_DMA
> [...]
>>>>>>> Sorry to interrupt. Seems the ,high ,low and fallback are main concerns
>>>>>>> about this version. And I have the same concerns about them which comes
>>>>>>> from below points:
>>>>>>> 1) we may need to take best effort to keep ,high, ,low behaviour
>>>>>>> consistent on all ARCHes. Otherwise user/admin may be confused when they
>>>>>>> deploy/configure kdump on different machines of different ARCHes in the
>>>>>>> same LAB. I think we should try to avoid the confusion.
> 
> I guess by all arches you mean just x86 here. Since the code is not
> generic, all arches do their own stuff.
> 
>>> OK, I plan to remove optimization, fallback and default low size, to follow the
>>> suggestion of Catalin first. But there's one minor point of contention.
>>>
>>> 1)    Both "crashkernel=X,high" and "crashkernel=X,low" must be present.
>>> 2)    Both "crashkernel=X,high" and "crashkernel=X,low" are present.
>>>    or
>>>       Allow "crashkernel=X,high" to be present alone. Unlike x86, the default low size is zero.
>>>
>>> I prefer 2), how about you?
> 
> (2) works for me as well. We keep these simple as "expert" options and

Okay, so I'll follow 2) to update the code.

> allow crashkernel= to fall back to 'high' if not sufficient memory in
> ZONE_DMA. That would be a slight change from the current behaviour but,
> as Zhen Lei said, with the old tools it's just moving the error around,
> the crashkernel wouldn't be available in either case.
> 
>>>>>>> 2) Fallback behaviour is important to our distros. The reason is we will
>>>>>>> provide default value with crashkernel=xxxM along kernel of distros. In
>>>>>>> this case, we hope the reservation will succeed by all means. The ,high
>>>>>>> and ,low is an option if customer likes to take with expertise.
> 
> OK, that's good feedback.
> 
> So, to recap, IIUC you are fine with:
> 
> 	crashkernel=Y		- allocate within ZONE_DMA with fallback
> 				  above with a default in ZONE_DMA (like
> 				  x86, 256M or swiotlb size)
> 	crashkernel=Y,high	- allocate from above ZONE_DMA
> 	crashkernel=Y,low	- allocate within ZONE_DMA
> 
> 'crashkernel' overrides the high and low while the latter two can be
> passed independently.
> 
>>>>>>> After going through arm64 memory init code, I got below summary about
>>>>>>> arm64_dma_phys_limit which is the first zone's upper limit. I think we
>>>>>>> can make use of it to facilitate to simplify code.
>>>>>>> ================================================================================
>>>>>>>                         DMA                      DMA32                    NORMAL
>>>>>>> 1)Raspberry Pi4         0~1G                     3G~4G                    (above 4G)
>>>>>>> 2)Normal machine        0~4G                     0                        (above 4G)
>>>>>>> 3)Special machine       (above 4G)~MAX
>>>>>>> 4)No DMA|DMA32                                                            (above 4G)~MAX
>>>>>
>>>>> arm64_memblock_init()
>>>>> 	reserve_crashkernel()        ---------------   0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into mem_init()")
>>>> We don't need different code for this place of reservation as you are
>>>> doing in this patchset, since arm64_dma_phys_limit is initialized as 
>>>> below. In fact, in arm64_memblock_init(), we have made memblock ready,
>>>> we can initialize arm64_dma_phys_limit as memblock_end_of_DRAM(). And if
>>>> memblock_start_of_DRAM() is bigger than 4G, we possibly can call
>>>> reserve_crashkernel() here too.
>>>
>>> Yes. Maybe all the devices in this environment are 64-bit. One way I
>>> know of allowing 32-bit devices to access high memory without SMMU
>>> is: Set a fixed value for the upper 32 bits. In this case, the DMA
>>> zone should be [phys_start, phys_start + 4G).
> 
> We decided that this case doesn't really exists for arm64 platforms (no
> need for special ZONE_DMA).
> 
>> I just read the message of commit 791ab8b2e3 ("arm64: Ignore any DMA
>> offsets in the max_zone_phys() calculation")
>>
>>     Currently, the kernel assumes that if RAM starts above 32-bit (or
>>     zone_bits), there is still a ZONE_DMA/DMA32 at the bottom of the RAM and
>>     such constrained devices have a hardwired DMA offset. In practice, we
>>     haven't noticed any such hardware so let's assume that we can expand
>>     ZONE_DMA32 to the available memory if no RAM below 4GB. Similarly,
>>     ZONE_DMA is expanded to the 4GB limit if no RAM addressable by
>>     zone_bits.
> 
> I think the above log is slightly confusing. If the DRAM starts above
> 4G, ZONE_DMA goes to the end of DRAM. If the DRAM starts below 4G but
> above the zone_bits for ZONE_DMA as specified in DT/ACPI, it pushes
> ZONE_DMA to 4G. I don't remember why we did this last part, maybe in
> case we get incorrect firmware tables, otherwise we could have extended
> ZONE_DMA to end of DRAM.
> 
> Zhen Lei, if we agreed on the crashkernel behaviour, could you please
> post a series that does the above parsing allocation? Ignore the
> optimisations, we can look at them afterwards.

OK, I've changed the code before the festival, and I'll test it today.

> 
> Thanks.
>
Baoquan He May 5, 2022, 3 a.m. UTC | #21
On 05/03/22 at 11:00pm, Catalin Marinas wrote:
> On Fri, Apr 29, 2022 at 04:25:37PM +0800, Leizhen (ThunderTown) wrote:
> > On 2022/4/29 16:02, Leizhen (ThunderTown) wrote:
> > > On 2022/4/29 11:24, Baoquan He wrote:
> > >> On 04/28/22 at 05:33pm, Leizhen (ThunderTown) wrote:
> > >>> On 2022/4/28 11:52, Baoquan He wrote:
> > >>>> On 04/28/22 at 11:40am, Baoquan He wrote:
> > >>>>> On 04/27/22 at 05:04pm, Catalin Marinas wrote:
> > >>>>>> There will be some difference as the 4G limit doesn't always hold for
> > >>>>>> arm64 (though it's true in most cases). Anyway, we can probably simplify
> > >>>>>> things a bit while following the documented behaviour:
> > >>>>>>
> > >>>>>> 	crashkernel=Y		- current behaviour within ZONE_DMA
> > >>>>>> 	crashkernel=Y,high	- allocate from above ZONE_DMA
> > >>>>>> 	crashkernel=Y,low	- allocate within ZONE_DMA
> [...]
> > >>>>> Sorry to interrupt. Seems the ,high ,low and fallback are main concerns
> > >>>>> about this version. And I have the same concerns about them which comes
> > >>>>> from below points:
> > >>>>> 1) we may need to take best effort to keep ,high, ,low behaviour
> > >>>>> consistent on all ARCHes. Otherwise user/admin may be confused when they
> > >>>>> deploy/configure kdump on different machines of different ARCHes in the
> > >>>>> same LAB. I think we should try to avoid the confusion.
> 
> I guess by all arches you mean just x86 here. Since the code is not
> generic, all arches do their own stuff.

Right. Since currently only x86 has crashkernel,high|low support. From
the distros and customer's point of view, we would like to see the same
feature has the same or similar behaviour. This will ease operation and
maintaining. E.g on the cloud platform, the base of it could be any
ARCH, x86, arm64. The inconsistent behaviour could cause confusion.
Certainly, the underlying implementation may be different.

Surely, if arm64 has its own manner because of reasons, we can
add document to note that.

> 
> > > OK, I plan to remove optimization, fallback and default low size, to follow the
> > > suggestion of Catalin first. But there's one minor point of contention.
> > > 
> > > 1)    Both "crashkernel=X,high" and "crashkernel=X,low" must be present.
> > > 2)    Both "crashkernel=X,high" and "crashkernel=X,low" are present.
> > >    or
> > >       Allow "crashkernel=X,high" to be present alone. Unlike x86, the default low size is zero.
> > > 
> > > I prefer 2), how about you?
> 
> (2) works for me as well. We keep these simple as "expert" options and
> allow crashkernel= to fall back to 'high' if not sufficient memory in
> ZONE_DMA. That would be a slight change from the current behaviour but,
> as Zhen Lei said, with the old tools it's just moving the error around,
> the crashkernel wouldn't be available in either case.
> 
> > >>>>> 2) Fallback behaviour is important to our distros. The reason is we will
> > >>>>> provide default value with crashkernel=xxxM along kernel of distros. In
> > >>>>> this case, we hope the reservation will succeed by all means. The ,high
> > >>>>> and ,low is an option if customer likes to take with expertise.
> 
> OK, that's good feedback.
> 
> So, to recap, IIUC you are fine with:
> 
> 	crashkernel=Y		- allocate within ZONE_DMA with fallback
> 				  above with a default in ZONE_DMA (like
> 				  x86, 256M or swiotlb size)
                                  
        Ack to this one. 


> 	crashkernel=Y,high	- allocate from above ZONE_DMA
                                  
        Not exactly. If there's only ZONE_DMA, crashkernel,high will
        be reserved in ZONE_DMA, and crashkernel,low will be ignored.
        Other than this, ack.

> 	crashkernel=Y,low	- allocate within ZONE_DMA

        Ack to this one.
> 
> 'crashkernel' overrides the high and low while the latter two can be
> passed independently.

        crashkernel=,high can be passed independently, then a crashkernel=,low
        is needed implicitly. If people don't want crashkernel=,low
        explicitly, crashkernel=0,low need be specified.

        An independent crashkernel=,low makes no sense. Crashkernel=,low
        should be paird with crashkernel=,high.
        
        My personal opinion according to the existed senmantics on x86.
        Otherwise, the guidance of crashkernel= |,high|,low reservation
        will be complicated to write.

> 
> > >>>>> After going through arm64 memory init code, I got below summary about
> > >>>>> arm64_dma_phys_limit which is the first zone's upper limit. I think we
> > >>>>> can make use of it to facilitate to simplify code.
> > >>>>> ================================================================================
> > >>>>>                         DMA                      DMA32                    NORMAL
> > >>>>> 1)Raspberry Pi4         0~1G                     3G~4G                    (above 4G)
> > >>>>> 2)Normal machine        0~4G                     0                        (above 4G)
> > >>>>> 3)Special machine       (above 4G)~MAX
> > >>>>> 4)No DMA|DMA32                                                            (above 4G)~MAX
> > >>>
> > >>> arm64_memblock_init()
> > >>> 	reserve_crashkernel()        ---------------   0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into mem_init()")
> > >> We don't need different code for this place of reservation as you are
> > >> doing in this patchset, since arm64_dma_phys_limit is initialized as 
> > >> below. In fact, in arm64_memblock_init(), we have made memblock ready,
> > >> we can initialize arm64_dma_phys_limit as memblock_end_of_DRAM(). And if
> > >> memblock_start_of_DRAM() is bigger than 4G, we possibly can call
> > >> reserve_crashkernel() here too.
> > > 
> > > Yes. Maybe all the devices in this environment are 64-bit. One way I
> > > know of allowing 32-bit devices to access high memory without SMMU
> > > is: Set a fixed value for the upper 32 bits. In this case, the DMA
> > > zone should be [phys_start, phys_start + 4G).
> 
> We decided that this case doesn't really exists for arm64 platforms (no
> need for special ZONE_DMA).
> 
> > I just read the message of commit 791ab8b2e3 ("arm64: Ignore any DMA
> > offsets in the max_zone_phys() calculation")
> > 
> >     Currently, the kernel assumes that if RAM starts above 32-bit (or
> >     zone_bits), there is still a ZONE_DMA/DMA32 at the bottom of the RAM and
> >     such constrained devices have a hardwired DMA offset. In practice, we
> >     haven't noticed any such hardware so let's assume that we can expand
> >     ZONE_DMA32 to the available memory if no RAM below 4GB. Similarly,
> >     ZONE_DMA is expanded to the 4GB limit if no RAM addressable by
> >     zone_bits.
> 
> I think the above log is slightly confusing. If the DRAM starts above
> 4G, ZONE_DMA goes to the end of DRAM. If the DRAM starts below 4G but
> above the zone_bits for ZONE_DMA as specified in DT/ACPI, it pushes
> ZONE_DMA to 4G. I don't remember why we did this last part, maybe in
> case we get incorrect firmware tables, otherwise we could have extended
> ZONE_DMA to end of DRAM.
> 
> Zhen Lei, if we agreed on the crashkernel behaviour, could you please
> post a series that does the above parsing allocation? Ignore the
> optimisations, we can look at them afterwards.
> 
> Thanks.
> 
> -- 
> Catalin
>
Catalin Marinas May 5, 2022, 2:20 p.m. UTC | #22
On Thu, May 05, 2022 at 11:00:19AM +0800, Baoquan He wrote:
> On 05/03/22 at 11:00pm, Catalin Marinas wrote:
> > So, to recap, IIUC you are fine with:
> > 
> > 	crashkernel=Y		- allocate within ZONE_DMA with fallback
> > 				  above with a default in ZONE_DMA (like
> > 				  x86, 256M or swiotlb size)
> 
>         Ack to this one.
> 
> 
> > 	crashkernel=Y,high	- allocate from above ZONE_DMA
> 
>         Not exactly. If there's only ZONE_DMA, crashkernel,high will
>         be reserved in ZONE_DMA, and crashkernel,low will be ignored.
>         Other than this, ack.

Yes, that's fine.

> > 	crashkernel=Y,low	- allocate within ZONE_DMA
> 
>         Ack to this one.
> > 
> > 'crashkernel' overrides the high and low while the latter two can be
> > passed independently.
> 
>         crashkernel=,high can be passed independently, then a crashkernel=,low
>         is needed implicitly. If people don't want crashkernel=,low
>         explicitly, crashkernel=0,low need be specified.

I find this complicating the interface. I don't know the background to
the x86 implementation but we diverge already on arm64 since we talk
about ZONE_DMA rather than 4G limit (though for most platforms these
would be the same).

I guess we could restate the difference between crashkernel= and
crashkernel=,high as the hint to go for allocation above ZONE_DMA first.

>         An independent crashkernel=,low makes no sense. Crashkernel=,low
>         should be paird with crashkernel=,high.

You could argue that crashkernel=,low gives the current crashkernel=
behaviour, i.e. either all within ZONE_DMA or fail to allocate. So it
may have some value on its own.

>         My personal opinion according to the existed senmantics on x86.
>         Otherwise, the guidance of crashkernel= |,high|,low reservation
>         will be complicated to write.

It's more that I find the current semantics unnecessarily confusing. But
even reading the x86_64 text it's not that clear. For example the
default low allocation for crashkernel= and crashkernel=,high is only
mentioned in the crashkernel=,low description.
Baoquan He May 6, 2022, 11:39 a.m. UTC | #23
On 05/05/22 at 03:20pm, Catalin Marinas wrote:
> On Thu, May 05, 2022 at 11:00:19AM +0800, Baoquan He wrote:
> > On 05/03/22 at 11:00pm, Catalin Marinas wrote:
> > > So, to recap, IIUC you are fine with:
> > > 
> > > 	crashkernel=Y		- allocate within ZONE_DMA with fallback
> > > 				  above with a default in ZONE_DMA (like
> > > 				  x86, 256M or swiotlb size)
> > 
> >         Ack to this one.
> > 
> > 
> > > 	crashkernel=Y,high	- allocate from above ZONE_DMA
> > 
> >         Not exactly. If there's only ZONE_DMA, crashkernel,high will
> >         be reserved in ZONE_DMA, and crashkernel,low will be ignored.
> >         Other than this, ack.
> 
> Yes, that's fine.
> 
> > > 	crashkernel=Y,low	- allocate within ZONE_DMA
> > 
> >         Ack to this one.
> > > 
> > > 'crashkernel' overrides the high and low while the latter two can be
> > > passed independently.
> > 
> >         crashkernel=,high can be passed independently, then a crashkernel=,low
> >         is needed implicitly. If people don't want crashkernel=,low
> >         explicitly, crashkernel=0,low need be specified.
> 
> I find this complicating the interface. I don't know the background to
> the x86 implementation but we diverge already on arm64 since we talk
> about ZONE_DMA rather than 4G limit (though for most platforms these
> would be the same).
> 
> I guess we could restate the difference between crashkernel= and
> crashkernel=,high as the hint to go for allocation above ZONE_DMA first.

Yes, rethinking about this, we can make a straightforward and simpler
crashkernel=,high|,low on arm64, namely asking for user to clearly
specify them.

During maintenance of crashkernel= parameter in our distros, we found
crashkernel=xM is used mostly since most of systems can be satisfied
with 256M or a little more for kdump. While on some big end servers,
1G or more crashkernel memory is needed. In this case, crashkernel=,high
is taken. We don't want to reserve so much low memory during system
running while just waiting in case rare crash happened. crashkernel=,high
is rarely used, so making it simple and not so flexible is not so bad.
We can improve it later with justification.

> 
> >         An independent crashkernel=,low makes no sense. Crashkernel=,low
> >         should be paird with crashkernel=,high.
> 
> You could argue that crashkernel=,low gives the current crashkernel=
> behaviour, i.e. either all within ZONE_DMA or fail to allocate. So it
> may have some value on its own.

Yes, crashkernel=,low has the same behaviour as the current crashkernel=
if we decide not to add fallback mechanism to it. The purpose of
crahskernel=,low is to assist crashkernel=,high to get kdump kernel
boot up with satisfing DMA allocation. While allowing independent
crashkernel=,low will add it another mission, limiting crashkernel only
reserved in low memory. Up to now, we don't see the need for that.

> 
> >         My personal opinion according to the existed senmantics on x86.
> >         Otherwise, the guidance of crashkernel= |,high|,low reservation
> >         will be complicated to write.
> 
> It's more that I find the current semantics unnecessarily confusing. But
> even reading the x86_64 text it's not that clear. For example the
> default low allocation for crashkernel= and crashkernel=,high is only
> mentioned in the crashkernel=,low description.

Yeah, we can improve those document if insufficiency is found.

By the way, with my observation, crashkernel= with fallback meet
99% of our needs. If people really need more than 512M memory or more,
then please consider crashkernel=,high. Basically on servers, low memory
is limited, while high memory is very big.

So I agree with you that we can make it step by step, firstly adding
basic crashkernel=,high and ,low support. We can add those complicated
cases later.