Message ID | 20220414115720.1887-1-thunder.leizhen@huawei.com |
---|---|
Headers | show |
Series | support reserving crashkernel above 4G on arm64 kdump | expand |
For the series: Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com> Thanks for keeping this patch set alive. On 4/14/22 6:57AM, Zhen Lei wrote: > Changes since [v21]: > 1. Update the commit message of patch 1 and 5. > 2. Add some comments for reserve_crashkernel() in patch 5. > > Thanks to Baoquan He and John Donnelly for their review comments. > > Because v5.18-rc1 has added a new patch > commit 031495635b46 ("arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones") > There are many new scenarios: > 1) The mappings may be block or page-level. > 2) The call to reserve_crashkernel() may or may not be deferred. > 3) The the upper limit of DMA address may be 4G, or less than 4G. Or the > upper limit of physical memory, because SMMU can do the mapping. > > The code of patch 1-2, 8-9 keep no change, because the above-mentioned issues are not involved. > The code of patch 5 only makes the following changes: > - if (crash_base >= SZ_4G) > + /* > + * When both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32 are disabled, the > + * CRASH_ADDR_LOW_MAX equals the upper limit of physical memory, so > + * the 'crash_base' of high memory can not exceed it. To follow the > + * description of "crashkernel=X,high" option, add below 'high' > + * condition to make sure the crash low memory will be reserved. > + */ > + if ((crash_base >= CRASH_ADDR_LOW_MAX) || high) { > Change SZ_4G to CRASH_ADDR_LOW_MAX, because arm64_dma_phys_limit may be less than > 4G or greater than 4G. The check 'high' is used for "crashkernel=X,high" and > "(crash_base >= CRASH_ADDR_LOW_MAX)" is used for "crashkernel=X[@offset]". > > Patch 3-4 to allow block mappings for memory above 4G. > Patch 6-7 to support only crash high memory or fixed memory range specified by > crashkernel=X@offset use page-level mapping, to allow other areas use block mapping. > These four patches are for performance optimization purposes. For details about the > technical feasibility analysis, please see the commit messages. > > Now the implementation of arm64 is very different from that of x86. It's no longer > suitable for both of them to share code. > > > > Changes since [v20]: > 1. Check whether crashkernel=Y,low is incorrectly configured or not configured. Do different processing. > 2. Share the existing description of x86. The configuration of arm64 is the same as that of x86. > 3. Define the value of macro CRASH_ADDR_HIGH_MAX as memblock.current_limit, instead of MEMBLOCK_ALLOC_ACCESSIBLE. > 4. To improve readability, some lightweight code adjustments have been made to reserve_craskernel(), including comments. > 5. The defined value of DEFAULT_CRASH_KERNEL_LOW_SIZE reconsiders swiotlb, just like x86, to share documents. > > Thanks to Baoquan He for his careful review. > > The test cases are as follows: (Please update the kexec tool to the latest version) > 1) crashkernel=4G //high=4G, low=256M > 2) crashkernel=4G crashkernel=512M,high crashkernel=512M,low //high=4G, low=256M, high and low are ignored > 3) crashkernel=4G crashkernel=512M,high //high=4G, low=256M, high is ignored > 4) crashkernel=4G crashkernel=512M,low //high=4G, low=256M, low is ignored > 5) crashkernel=4G@0xe0000000 //high=0G, low=0M, cannot allocate, failed > 6) crashkernel=512M //high=0G, low=512M > 7) crashkernel=128M //high=0G, low=128M > 8) crashkernel=512M@0xde000000 //512M@3552M //high=0G, low=512M > 9) crashkernel=4G,high //high=4G, low=256M > a) crashkernel=4G,high crashkernel=512M,low //high=4G, low=512M > b) crashkernel=512M,high crashkernel=128M,low //high=512M, low=128M > c) crashkernel=128M,high //high=128M, low=256M > d) crashkernel=512M,low //high=0G, low=0M, invalid > e) crashkernel=512M,high crashkernel=0,low //high=512M, low=0M > f) crashkernel=4G,high crashkernel=ab,low //high=0G, low=0M, invalid > > > Changes since [v19]: > 1. Temporarily stop making reserve_crashkernel[_low]() generic. There are a > lot of details need to be considered, which can take a long time. Because > "make generic" does not add new functions and does not improve performance, > maybe I should say it's just a cleanup. So by stripping it out and leaving > it for other patches later, we can aggregate the changes to the main functions. > 2. Use insert_resource() to replace request_resource(), this not only simplifies > the code, but also reduces the differences between arm64 and x86 implementations. > 3. As commit 157752d84f5d ("kexec: use Crash kernel for Crash kernel low") do for > x86, we can also extend kexec-tools for arm64, and it's currently applied. See: > https://www.spinics.net/lists/kexec/msg28284.html > > Thank you very much, Borislav Petkov, for so many valuable comments. > > Changes since [v17]: v17 --> v19 > 1. Patch 0001-0004 > Introduce generic parse_crashkernel_high_low() to bring the parsing of > "crashkernel=X,high" and the parsing of "crashkernel=X,low" together, > then use it instead of the call to parse_crashkernel_{high|low}(). Two > confusing parameters of parse_crashkernel_{high|low}() are deleted. > > I previously sent these four patches separately: > [1] https://lkml.org/lkml/2021/12/25/40 > 2. Patch 0005-0009 > Introduce generic reserve_crashkernel_mem[_low](), the implementation of > these two functions is based on function reserve_crashkernel[_low]() in > arch/x86/kernel/setup.c. There is no functional change for x86. > 1) The check position of xen_pv_domain() does not change. > 2) Still 1M alignment for crash kernel fixed region, when 'base' is specified. > > To avoid compilation problems on other architectures: patch 0004 moves > the definition of global variable crashk[_low]_res from kexec_core.c to > crash_core.c, and provide default definitions for all macros involved, a > particular platform can redefine these macros to override the default > values. > 3. 0010, only one line of comment was changed. > 4. 0011 > 1) crashk_low_res may also a valid reserved memory, should be checked > in crash_is_nosave(), see arch/arm64/kernel/machine_kexec. > 2) Drop memblock_mark_nomap() for crashk_low_res, because of: > 2687275a5843 arm64: Force NO_BLOCK_MAPPINGS if crashkernel reservation is required > 3) Also call kmemleak_ignore_phys() for crashk_low_res, because of: > 85f58eb18898 arm64: kdump: Skip kmemleak scan reserved memory for kdump > 5. 0012, slightly rebased, because the following patch is applied in advance. > https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git/commit/?h=dt/linus&id=8347b41748c3019157312fbe7f8a6792ae396eb7 > 6. 0013, no change. > > Others: > 1. Discard add ARCH_WANT_RESERVE_CRASH_KERNEL > 2. When allocating crash low memory, the start address still starts from 0. > low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX); > 3. Discard change (1ULL << 32) to CRASH_ADDR_LOW_MAX. > 4. Ensure the check position of xen_pv_domain() have no change. > 5. Except patch 0010 and 0012, all "Tested-by", "Reviewed-by", "Acked-by" are removed. > 6. Update description. > > > > Changes since [v16] > - Because no functional changes in this version, so add > "Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>" for patch 1-9 > - Add "Reviewed-by: Rob Herring <robh@kernel.org>" for patch 8 > - Update patch 9 based on the review comments of Rob Herring > - As Catalin Marinas's suggestion, merge the implementation of > ARCH_WANT_RESERVE_CRASH_KERNEL into patch 5. Ensure that the > contents of X86 and ARM64 do not overlap, and reduce unnecessary > temporary differences. > > Changes since [v15] > - Aggregate the processing of "linux,usable-memory-range" into one function. > Only patch 9-10 have been updated. > > Changes since [v14] > - Recovering the requirement that the CrashKernel memory regions on X86 > only requires 1 MiB alignment. > - Combine patches 5 and 6 in v14 into one. The compilation warning fixed > by patch 6 was introduced by patch 5 in v14. > - As with crashk_res, crashk_low_res is also processed by > crash_exclude_mem_range() in patch 7. > - Due to commit b261dba2fdb2 ("arm64: kdump: Remove custom linux,usable-memory-range handling") > has removed the architecture-specific code, extend the property "linux,usable-memory-range" > in the platform-agnostic FDT core code. See patch 9. > - Discard the x86 description update in the document, because the description > has been updated by commit b1f4c363666c ("Documentation: kdump: update kdump guide"). > - Change "arm64" to "ARM64" in Doc. > > > Changes since [v13] > - Rebased on top of 5.11-rc5. > - Introduce config CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL. > Since reserve_crashkernel[_low]() implementations are quite similar on > other architectures, so have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in > arch/Kconfig and select this by X86 and ARM64. > - Some minor cleanup. > > Changes since [v12] > - Rebased on top of 5.10-rc1. > - Keep CRASH_ALIGN as 16M suggested by Dave. > - Drop patch "kdump: add threshold for the required memory". > - Add Tested-by from John. > > Changes since [v11] > - Rebased on top of 5.9-rc4. > - Make the function reserve_crashkernel() of x86 generic. > Suggested by Catalin, make the function reserve_crashkernel() of x86 generic > and arm64 use the generic version to reimplement crashkernel=X. > > Changes since [v10] > - Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin. > > Changes since [v9] > - Patch 1 add Acked-by from Dave. > - Update patch 5 according to Dave's comments. > - Update chosen schema. > > Changes since [v8] > - Reuse DT property "linux,usable-memory-range". > Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low > memory region. > - Fix kdump broken with ZONE_DMA reintroduced. > - Update chosen schema. > > Changes since [v7] > - Move x86 CRASH_ALIGN to 2M > Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M. > - Update Documentation/devicetree/bindings/chosen.txt. > Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt > suggested by Arnd. > - Add Tested-by from Jhon and pk. > > Changes since [v6] > - Fix build errors reported by kbuild test robot. > > Changes since [v5] > - Move reserve_crashkernel_low() into kernel/crash_core.c. > - Delete crashkernel=X,high. > - Modify crashkernel=X,low. > If crashkernel=X,low is specified simultaneously, reserve spcified size low > memory for crash kdump kernel devices firstly and then reserve memory above 4G. > In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then > pass to crash dump kernel by DT property "linux,low-memory-range". > - Update Documentation/admin-guide/kdump/kdump.rst. > > Changes since [v4] > - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike. > > Changes since [v3] > - Add memblock_cap_memory_ranges back for multiple ranges. > - Fix some compiling warnings. > > Changes since [v2] > - Split patch "arm64: kdump: support reserving crashkernel above 4G" as > two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate > patch. > > Changes since [v1]: > - Move common reserve_crashkernel_low() code into kernel/kexec_core.c. > - Remove memblock_cap_memory_ranges() i added in v1 and implement that > in fdt_enforce_memory_region(). > There are at most two crash kernel regions, for two crash kernel regions > case, we cap the memory range [min(regs[*].start), max(regs[*].end)] > and then remove the memory range in the middle. > > v1: > There are following issues in arm64 kdump: > 1. We use crashkernel=X to reserve crashkernel below 4G, which > will fail when there is no enough low memory. > 2. If reserving crashkernel above 4G, in this case, crash dump > kernel will boot failure because there is no low memory available > for allocation. > > To solve these issues, change the behavior of crashkernel=X. > crashkernel=X tries low allocation in DMA zone and fall back to high > allocation if it fails. > > We can also use "crashkernel=X,high" to select a high region above > DMA zone, which also tries to allocate at least 256M low memory in > DMA zone automatically and "crashkernel=Y,low" can be used to allocate > specified size low memory. > > When reserving crashkernel in high memory, some low memory is reserved > for crash dump kernel devices. So there may be two regions reserved for > crash dump kernel. > In order to distinct from the high region and make no effect to the use > of existing kexec-tools, rename the low region as "Crash kernel (low)", > and pass the low region by reusing DT property > "linux,usable-memory-range". We made the low memory region as the last > range of "linux,usable-memory-range" to keep compatibility with existing > user-space and older kdump kernels. > > Besides, we need to modify kexec-tools: > arm64: support more than one crash kernel regions(see [1]) > > Another update is document about DT property 'linux,usable-memory-range': > schemas: update 'linux,usable-memory-range' node schema(see [2]) > > > [1]: https://www.spinics.net/lists/kexec/msg28226.html > [2]: https://github.com/robherring/dt-schema/pull/19 > [v1]: https://lkml.org/lkml/2019/4/2/1174 > [v2]: https://lkml.org/lkml/2019/4/9/86 > [v3]: https://lkml.org/lkml/2019/4/9/306 > [v4]: https://lkml.org/lkml/2019/4/15/273 > [v5]: https://lkml.org/lkml/2019/5/6/1360 > [v6]: https://lkml.org/lkml/2019/8/30/142 > [v7]: https://lkml.org/lkml/2019/12/23/411 > [v8]: https://lkml.org/lkml/2020/5/21/213 > [v9]: https://lkml.org/lkml/2020/6/28/73 > [v10]: https://lkml.org/lkml/2020/7/2/1443 > [v11]: https://lkml.org/lkml/2020/8/1/150 > [v12]: https://lkml.org/lkml/2020/9/7/1037 > [v13]: https://lkml.org/lkml/2020/10/31/34 > [v14]: https://lkml.org/lkml/2021/1/30/53 > [v15]: https://lkml.org/lkml/2021/10/19/1405 > [v16]: https://lkml.org/lkml/2021/11/23/435 > [v17]: https://lkml.org/lkml/2021/12/10/38 > [v18]: https://lkml.org/lkml/2021/12/22/424 > [v19]: https://lkml.org/lkml/2021/12/28/203 > [v20]: https://lkml.org/lkml/2022/1/24/167 > [v21]: https://lkml.org/lkml/2022/2/26/350 > > Chen Zhou (2): > arm64: kdump: Reimplement crashkernel=X > of: fdt: Add memory for devices by DT property > "linux,usable-memory-range" > > Zhen Lei (7): > kdump: return -ENOENT if required cmdline option does not exist > arm64: Use insert_resource() to simplify code > arm64: kdump: Remove some redundant checks in map_mem() > arm64: kdump: Don't force page-level mappings for memory above 4G > arm64: kdump: Use page-level mapping for the high memory of > crashkernel > arm64: kdump: Try not to use NO_BLOCK_MAPPINGS for memory under 4G > docs: kdump: Update the crashkernel description for arm64 > > .../admin-guide/kernel-parameters.txt | 8 +- > arch/arm64/include/asm/kexec.h | 2 + > arch/arm64/kernel/machine_kexec.c | 9 +- > arch/arm64/kernel/machine_kexec_file.c | 12 +- > arch/arm64/kernel/setup.c | 17 +- > arch/arm64/mm/init.c | 204 +++++++++++++++++- > arch/arm64/mm/mmu.c | 68 +++--- > drivers/of/fdt.c | 33 ++- > kernel/crash_core.c | 3 +- > 9 files changed, 278 insertions(+), 78 deletions(-) >
Hi, Catalin Marinas and Will Deacon: Do you have time to review patches 3-4 and 6-7? These patches are a little more relevant to the arm64 architecture. Hi, Baoquan He: Can you review patches 1 and 5 again? On 2022/4/14 19:57, Zhen Lei wrote: > Changes since [v21]: > 1. Update the commit message of patch 1 and 5. > 2. Add some comments for reserve_crashkernel() in patch 5. > > Thanks to Baoquan He and John Donnelly for their review comments. > > Because v5.18-rc1 has added a new patch > commit 031495635b46 ("arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones") > There are many new scenarios: > 1) The mappings may be block or page-level. > 2) The call to reserve_crashkernel() may or may not be deferred. > 3) The the upper limit of DMA address may be 4G, or less than 4G. Or the > upper limit of physical memory, because SMMU can do the mapping. > > The code of patch 1-2, 8-9 keep no change, because the above-mentioned issues are not involved. > The code of patch 5 only makes the following changes: > - if (crash_base >= SZ_4G) > + /* > + * When both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32 are disabled, the > + * CRASH_ADDR_LOW_MAX equals the upper limit of physical memory, so > + * the 'crash_base' of high memory can not exceed it. To follow the > + * description of "crashkernel=X,high" option, add below 'high' > + * condition to make sure the crash low memory will be reserved. > + */ > + if ((crash_base >= CRASH_ADDR_LOW_MAX) || high) { > Change SZ_4G to CRASH_ADDR_LOW_MAX, because arm64_dma_phys_limit may be less than > 4G or greater than 4G. The check 'high' is used for "crashkernel=X,high" and > "(crash_base >= CRASH_ADDR_LOW_MAX)" is used for "crashkernel=X[@offset]". > > Patch 3-4 to allow block mappings for memory above 4G. > Patch 6-7 to support only crash high memory or fixed memory range specified by > crashkernel=X@offset use page-level mapping, to allow other areas use block mapping. > These four patches are for performance optimization purposes. For details about the > technical feasibility analysis, please see the commit messages. > > Now the implementation of arm64 is very different from that of x86. It's no longer > suitable for both of them to share code. > > > > Changes since [v20]: > 1. Check whether crashkernel=Y,low is incorrectly configured or not configured. Do different processing. > 2. Share the existing description of x86. The configuration of arm64 is the same as that of x86. > 3. Define the value of macro CRASH_ADDR_HIGH_MAX as memblock.current_limit, instead of MEMBLOCK_ALLOC_ACCESSIBLE. > 4. To improve readability, some lightweight code adjustments have been made to reserve_craskernel(), including comments. > 5. The defined value of DEFAULT_CRASH_KERNEL_LOW_SIZE reconsiders swiotlb, just like x86, to share documents. > > Thanks to Baoquan He for his careful review. > > The test cases are as follows: (Please update the kexec tool to the latest version) > 1) crashkernel=4G //high=4G, low=256M > 2) crashkernel=4G crashkernel=512M,high crashkernel=512M,low //high=4G, low=256M, high and low are ignored > 3) crashkernel=4G crashkernel=512M,high //high=4G, low=256M, high is ignored > 4) crashkernel=4G crashkernel=512M,low //high=4G, low=256M, low is ignored > 5) crashkernel=4G@0xe0000000 //high=0G, low=0M, cannot allocate, failed > 6) crashkernel=512M //high=0G, low=512M > 7) crashkernel=128M //high=0G, low=128M > 8) crashkernel=512M@0xde000000 //512M@3552M //high=0G, low=512M > 9) crashkernel=4G,high //high=4G, low=256M > a) crashkernel=4G,high crashkernel=512M,low //high=4G, low=512M > b) crashkernel=512M,high crashkernel=128M,low //high=512M, low=128M > c) crashkernel=128M,high //high=128M, low=256M > d) crashkernel=512M,low //high=0G, low=0M, invalid > e) crashkernel=512M,high crashkernel=0,low //high=512M, low=0M > f) crashkernel=4G,high crashkernel=ab,low //high=0G, low=0M, invalid > > > Changes since [v19]: > 1. Temporarily stop making reserve_crashkernel[_low]() generic. There are a > lot of details need to be considered, which can take a long time. Because > "make generic" does not add new functions and does not improve performance, > maybe I should say it's just a cleanup. So by stripping it out and leaving > it for other patches later, we can aggregate the changes to the main functions. > 2. Use insert_resource() to replace request_resource(), this not only simplifies > the code, but also reduces the differences between arm64 and x86 implementations. > 3. As commit 157752d84f5d ("kexec: use Crash kernel for Crash kernel low") do for > x86, we can also extend kexec-tools for arm64, and it's currently applied. See: > https://www.spinics.net/lists/kexec/msg28284.html > > Thank you very much, Borislav Petkov, for so many valuable comments. > > Changes since [v17]: v17 --> v19 > 1. Patch 0001-0004 > Introduce generic parse_crashkernel_high_low() to bring the parsing of > "crashkernel=X,high" and the parsing of "crashkernel=X,low" together, > then use it instead of the call to parse_crashkernel_{high|low}(). Two > confusing parameters of parse_crashkernel_{high|low}() are deleted. > > I previously sent these four patches separately: > [1] https://lkml.org/lkml/2021/12/25/40 > 2. Patch 0005-0009 > Introduce generic reserve_crashkernel_mem[_low](), the implementation of > these two functions is based on function reserve_crashkernel[_low]() in > arch/x86/kernel/setup.c. There is no functional change for x86. > 1) The check position of xen_pv_domain() does not change. > 2) Still 1M alignment for crash kernel fixed region, when 'base' is specified. > > To avoid compilation problems on other architectures: patch 0004 moves > the definition of global variable crashk[_low]_res from kexec_core.c to > crash_core.c, and provide default definitions for all macros involved, a > particular platform can redefine these macros to override the default > values. > 3. 0010, only one line of comment was changed. > 4. 0011 > 1) crashk_low_res may also a valid reserved memory, should be checked > in crash_is_nosave(), see arch/arm64/kernel/machine_kexec. > 2) Drop memblock_mark_nomap() for crashk_low_res, because of: > 2687275a5843 arm64: Force NO_BLOCK_MAPPINGS if crashkernel reservation is required > 3) Also call kmemleak_ignore_phys() for crashk_low_res, because of: > 85f58eb18898 arm64: kdump: Skip kmemleak scan reserved memory for kdump > 5. 0012, slightly rebased, because the following patch is applied in advance. > https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git/commit/?h=dt/linus&id=8347b41748c3019157312fbe7f8a6792ae396eb7 > 6. 0013, no change. > > Others: > 1. Discard add ARCH_WANT_RESERVE_CRASH_KERNEL > 2. When allocating crash low memory, the start address still starts from 0. > low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX); > 3. Discard change (1ULL << 32) to CRASH_ADDR_LOW_MAX. > 4. Ensure the check position of xen_pv_domain() have no change. > 5. Except patch 0010 and 0012, all "Tested-by", "Reviewed-by", "Acked-by" are removed. > 6. Update description. > > > > Changes since [v16] > - Because no functional changes in this version, so add > "Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>" for patch 1-9 > - Add "Reviewed-by: Rob Herring <robh@kernel.org>" for patch 8 > - Update patch 9 based on the review comments of Rob Herring > - As Catalin Marinas's suggestion, merge the implementation of > ARCH_WANT_RESERVE_CRASH_KERNEL into patch 5. Ensure that the > contents of X86 and ARM64 do not overlap, and reduce unnecessary > temporary differences. > > Changes since [v15] > - Aggregate the processing of "linux,usable-memory-range" into one function. > Only patch 9-10 have been updated. > > Changes since [v14] > - Recovering the requirement that the CrashKernel memory regions on X86 > only requires 1 MiB alignment. > - Combine patches 5 and 6 in v14 into one. The compilation warning fixed > by patch 6 was introduced by patch 5 in v14. > - As with crashk_res, crashk_low_res is also processed by > crash_exclude_mem_range() in patch 7. > - Due to commit b261dba2fdb2 ("arm64: kdump: Remove custom linux,usable-memory-range handling") > has removed the architecture-specific code, extend the property "linux,usable-memory-range" > in the platform-agnostic FDT core code. See patch 9. > - Discard the x86 description update in the document, because the description > has been updated by commit b1f4c363666c ("Documentation: kdump: update kdump guide"). > - Change "arm64" to "ARM64" in Doc. > > > Changes since [v13] > - Rebased on top of 5.11-rc5. > - Introduce config CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL. > Since reserve_crashkernel[_low]() implementations are quite similar on > other architectures, so have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in > arch/Kconfig and select this by X86 and ARM64. > - Some minor cleanup. > > Changes since [v12] > - Rebased on top of 5.10-rc1. > - Keep CRASH_ALIGN as 16M suggested by Dave. > - Drop patch "kdump: add threshold for the required memory". > - Add Tested-by from John. > > Changes since [v11] > - Rebased on top of 5.9-rc4. > - Make the function reserve_crashkernel() of x86 generic. > Suggested by Catalin, make the function reserve_crashkernel() of x86 generic > and arm64 use the generic version to reimplement crashkernel=X. > > Changes since [v10] > - Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin. > > Changes since [v9] > - Patch 1 add Acked-by from Dave. > - Update patch 5 according to Dave's comments. > - Update chosen schema. > > Changes since [v8] > - Reuse DT property "linux,usable-memory-range". > Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low > memory region. > - Fix kdump broken with ZONE_DMA reintroduced. > - Update chosen schema. > > Changes since [v7] > - Move x86 CRASH_ALIGN to 2M > Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M. > - Update Documentation/devicetree/bindings/chosen.txt. > Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt > suggested by Arnd. > - Add Tested-by from Jhon and pk. > > Changes since [v6] > - Fix build errors reported by kbuild test robot. > > Changes since [v5] > - Move reserve_crashkernel_low() into kernel/crash_core.c. > - Delete crashkernel=X,high. > - Modify crashkernel=X,low. > If crashkernel=X,low is specified simultaneously, reserve spcified size low > memory for crash kdump kernel devices firstly and then reserve memory above 4G. > In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then > pass to crash dump kernel by DT property "linux,low-memory-range". > - Update Documentation/admin-guide/kdump/kdump.rst. > > Changes since [v4] > - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike. > > Changes since [v3] > - Add memblock_cap_memory_ranges back for multiple ranges. > - Fix some compiling warnings. > > Changes since [v2] > - Split patch "arm64: kdump: support reserving crashkernel above 4G" as > two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate > patch. > > Changes since [v1]: > - Move common reserve_crashkernel_low() code into kernel/kexec_core.c. > - Remove memblock_cap_memory_ranges() i added in v1 and implement that > in fdt_enforce_memory_region(). > There are at most two crash kernel regions, for two crash kernel regions > case, we cap the memory range [min(regs[*].start), max(regs[*].end)] > and then remove the memory range in the middle. > > v1: > There are following issues in arm64 kdump: > 1. We use crashkernel=X to reserve crashkernel below 4G, which > will fail when there is no enough low memory. > 2. If reserving crashkernel above 4G, in this case, crash dump > kernel will boot failure because there is no low memory available > for allocation. > > To solve these issues, change the behavior of crashkernel=X. > crashkernel=X tries low allocation in DMA zone and fall back to high > allocation if it fails. > > We can also use "crashkernel=X,high" to select a high region above > DMA zone, which also tries to allocate at least 256M low memory in > DMA zone automatically and "crashkernel=Y,low" can be used to allocate > specified size low memory. > > When reserving crashkernel in high memory, some low memory is reserved > for crash dump kernel devices. So there may be two regions reserved for > crash dump kernel. > In order to distinct from the high region and make no effect to the use > of existing kexec-tools, rename the low region as "Crash kernel (low)", > and pass the low region by reusing DT property > "linux,usable-memory-range". We made the low memory region as the last > range of "linux,usable-memory-range" to keep compatibility with existing > user-space and older kdump kernels. > > Besides, we need to modify kexec-tools: > arm64: support more than one crash kernel regions(see [1]) > > Another update is document about DT property 'linux,usable-memory-range': > schemas: update 'linux,usable-memory-range' node schema(see [2]) > > > [1]: https://www.spinics.net/lists/kexec/msg28226.html > [2]: https://github.com/robherring/dt-schema/pull/19 > [v1]: https://lkml.org/lkml/2019/4/2/1174 > [v2]: https://lkml.org/lkml/2019/4/9/86 > [v3]: https://lkml.org/lkml/2019/4/9/306 > [v4]: https://lkml.org/lkml/2019/4/15/273 > [v5]: https://lkml.org/lkml/2019/5/6/1360 > [v6]: https://lkml.org/lkml/2019/8/30/142 > [v7]: https://lkml.org/lkml/2019/12/23/411 > [v8]: https://lkml.org/lkml/2020/5/21/213 > [v9]: https://lkml.org/lkml/2020/6/28/73 > [v10]: https://lkml.org/lkml/2020/7/2/1443 > [v11]: https://lkml.org/lkml/2020/8/1/150 > [v12]: https://lkml.org/lkml/2020/9/7/1037 > [v13]: https://lkml.org/lkml/2020/10/31/34 > [v14]: https://lkml.org/lkml/2021/1/30/53 > [v15]: https://lkml.org/lkml/2021/10/19/1405 > [v16]: https://lkml.org/lkml/2021/11/23/435 > [v17]: https://lkml.org/lkml/2021/12/10/38 > [v18]: https://lkml.org/lkml/2021/12/22/424 > [v19]: https://lkml.org/lkml/2021/12/28/203 > [v20]: https://lkml.org/lkml/2022/1/24/167 > [v21]: https://lkml.org/lkml/2022/2/26/350 > > Chen Zhou (2): > arm64: kdump: Reimplement crashkernel=X > of: fdt: Add memory for devices by DT property > "linux,usable-memory-range" > > Zhen Lei (7): > kdump: return -ENOENT if required cmdline option does not exist > arm64: Use insert_resource() to simplify code > arm64: kdump: Remove some redundant checks in map_mem() > arm64: kdump: Don't force page-level mappings for memory above 4G > arm64: kdump: Use page-level mapping for the high memory of > crashkernel > arm64: kdump: Try not to use NO_BLOCK_MAPPINGS for memory under 4G > docs: kdump: Update the crashkernel description for arm64 > > .../admin-guide/kernel-parameters.txt | 8 +- > arch/arm64/include/asm/kexec.h | 2 + > arch/arm64/kernel/machine_kexec.c | 9 +- > arch/arm64/kernel/machine_kexec_file.c | 12 +- > arch/arm64/kernel/setup.c | 17 +- > arch/arm64/mm/init.c | 204 +++++++++++++++++- > arch/arm64/mm/mmu.c | 68 +++--- > drivers/of/fdt.c | 33 ++- > kernel/crash_core.c | 3 +- > 9 files changed, 278 insertions(+), 78 deletions(-) >
On 04/25/22 at 10:19am, Leizhen (ThunderTown) wrote: > > Hi, Catalin Marinas and Will Deacon: > Do you have time to review patches 3-4 and 6-7? These patches are a little more > relevant to the arm64 architecture. > > Hi, Baoquan He: > Can you review patches 1 and 5 again? Yes, and I will test and review the whole series if no one check the arm64 related changes. > > > On 2022/4/14 19:57, Zhen Lei wrote: > > Changes since [v21]: > > 1. Update the commit message of patch 1 and 5. > > 2. Add some comments for reserve_crashkernel() in patch 5. > > > > Thanks to Baoquan He and John Donnelly for their review comments. > > > > Because v5.18-rc1 has added a new patch > > commit 031495635b46 ("arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones") > > There are many new scenarios: > > 1) The mappings may be block or page-level. > > 2) The call to reserve_crashkernel() may or may not be deferred. > > 3) The the upper limit of DMA address may be 4G, or less than 4G. Or the > > upper limit of physical memory, because SMMU can do the mapping. > > > > The code of patch 1-2, 8-9 keep no change, because the above-mentioned issues are not involved. > > The code of patch 5 only makes the following changes: > > - if (crash_base >= SZ_4G) > > + /* > > + * When both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32 are disabled, the > > + * CRASH_ADDR_LOW_MAX equals the upper limit of physical memory, so > > + * the 'crash_base' of high memory can not exceed it. To follow the > > + * description of "crashkernel=X,high" option, add below 'high' > > + * condition to make sure the crash low memory will be reserved. > > + */ > > + if ((crash_base >= CRASH_ADDR_LOW_MAX) || high) { > > Change SZ_4G to CRASH_ADDR_LOW_MAX, because arm64_dma_phys_limit may be less than > > 4G or greater than 4G. The check 'high' is used for "crashkernel=X,high" and > > "(crash_base >= CRASH_ADDR_LOW_MAX)" is used for "crashkernel=X[@offset]". > > > > Patch 3-4 to allow block mappings for memory above 4G. > > Patch 6-7 to support only crash high memory or fixed memory range specified by > > crashkernel=X@offset use page-level mapping, to allow other areas use block mapping. > > These four patches are for performance optimization purposes. For details about the > > technical feasibility analysis, please see the commit messages. > > > > Now the implementation of arm64 is very different from that of x86. It's no longer > > suitable for both of them to share code. > > > > > > > > Changes since [v20]: > > 1. Check whether crashkernel=Y,low is incorrectly configured or not configured. Do different processing. > > 2. Share the existing description of x86. The configuration of arm64 is the same as that of x86. > > 3. Define the value of macro CRASH_ADDR_HIGH_MAX as memblock.current_limit, instead of MEMBLOCK_ALLOC_ACCESSIBLE. > > 4. To improve readability, some lightweight code adjustments have been made to reserve_craskernel(), including comments. > > 5. The defined value of DEFAULT_CRASH_KERNEL_LOW_SIZE reconsiders swiotlb, just like x86, to share documents. > > > > Thanks to Baoquan He for his careful review. > > > > The test cases are as follows: (Please update the kexec tool to the latest version) > > 1) crashkernel=4G //high=4G, low=256M > > 2) crashkernel=4G crashkernel=512M,high crashkernel=512M,low //high=4G, low=256M, high and low are ignored > > 3) crashkernel=4G crashkernel=512M,high //high=4G, low=256M, high is ignored > > 4) crashkernel=4G crashkernel=512M,low //high=4G, low=256M, low is ignored > > 5) crashkernel=4G@0xe0000000 //high=0G, low=0M, cannot allocate, failed > > 6) crashkernel=512M //high=0G, low=512M > > 7) crashkernel=128M //high=0G, low=128M > > 8) crashkernel=512M@0xde000000 //512M@3552M //high=0G, low=512M > > 9) crashkernel=4G,high //high=4G, low=256M > > a) crashkernel=4G,high crashkernel=512M,low //high=4G, low=512M > > b) crashkernel=512M,high crashkernel=128M,low //high=512M, low=128M > > c) crashkernel=128M,high //high=128M, low=256M > > d) crashkernel=512M,low //high=0G, low=0M, invalid > > e) crashkernel=512M,high crashkernel=0,low //high=512M, low=0M > > f) crashkernel=4G,high crashkernel=ab,low //high=0G, low=0M, invalid > > > > > > Changes since [v19]: > > 1. Temporarily stop making reserve_crashkernel[_low]() generic. There are a > > lot of details need to be considered, which can take a long time. Because > > "make generic" does not add new functions and does not improve performance, > > maybe I should say it's just a cleanup. So by stripping it out and leaving > > it for other patches later, we can aggregate the changes to the main functions. > > 2. Use insert_resource() to replace request_resource(), this not only simplifies > > the code, but also reduces the differences between arm64 and x86 implementations. > > 3. As commit 157752d84f5d ("kexec: use Crash kernel for Crash kernel low") do for > > x86, we can also extend kexec-tools for arm64, and it's currently applied. See: > > https://www.spinics.net/lists/kexec/msg28284.html > > > > Thank you very much, Borislav Petkov, for so many valuable comments. > > > > Changes since [v17]: v17 --> v19 > > 1. Patch 0001-0004 > > Introduce generic parse_crashkernel_high_low() to bring the parsing of > > "crashkernel=X,high" and the parsing of "crashkernel=X,low" together, > > then use it instead of the call to parse_crashkernel_{high|low}(). Two > > confusing parameters of parse_crashkernel_{high|low}() are deleted. > > > > I previously sent these four patches separately: > > [1] https://lkml.org/lkml/2021/12/25/40 > > 2. Patch 0005-0009 > > Introduce generic reserve_crashkernel_mem[_low](), the implementation of > > these two functions is based on function reserve_crashkernel[_low]() in > > arch/x86/kernel/setup.c. There is no functional change for x86. > > 1) The check position of xen_pv_domain() does not change. > > 2) Still 1M alignment for crash kernel fixed region, when 'base' is specified. > > > > To avoid compilation problems on other architectures: patch 0004 moves > > the definition of global variable crashk[_low]_res from kexec_core.c to > > crash_core.c, and provide default definitions for all macros involved, a > > particular platform can redefine these macros to override the default > > values. > > 3. 0010, only one line of comment was changed. > > 4. 0011 > > 1) crashk_low_res may also a valid reserved memory, should be checked > > in crash_is_nosave(), see arch/arm64/kernel/machine_kexec. > > 2) Drop memblock_mark_nomap() for crashk_low_res, because of: > > 2687275a5843 arm64: Force NO_BLOCK_MAPPINGS if crashkernel reservation is required > > 3) Also call kmemleak_ignore_phys() for crashk_low_res, because of: > > 85f58eb18898 arm64: kdump: Skip kmemleak scan reserved memory for kdump > > 5. 0012, slightly rebased, because the following patch is applied in advance. > > https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git/commit/?h=dt/linus&id=8347b41748c3019157312fbe7f8a6792ae396eb7 > > 6. 0013, no change. > > > > Others: > > 1. Discard add ARCH_WANT_RESERVE_CRASH_KERNEL > > 2. When allocating crash low memory, the start address still starts from 0. > > low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX); > > 3. Discard change (1ULL << 32) to CRASH_ADDR_LOW_MAX. > > 4. Ensure the check position of xen_pv_domain() have no change. > > 5. Except patch 0010 and 0012, all "Tested-by", "Reviewed-by", "Acked-by" are removed. > > 6. Update description. > > > > > > > > Changes since [v16] > > - Because no functional changes in this version, so add > > "Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>" for patch 1-9 > > - Add "Reviewed-by: Rob Herring <robh@kernel.org>" for patch 8 > > - Update patch 9 based on the review comments of Rob Herring > > - As Catalin Marinas's suggestion, merge the implementation of > > ARCH_WANT_RESERVE_CRASH_KERNEL into patch 5. Ensure that the > > contents of X86 and ARM64 do not overlap, and reduce unnecessary > > temporary differences. > > > > Changes since [v15] > > - Aggregate the processing of "linux,usable-memory-range" into one function. > > Only patch 9-10 have been updated. > > > > Changes since [v14] > > - Recovering the requirement that the CrashKernel memory regions on X86 > > only requires 1 MiB alignment. > > - Combine patches 5 and 6 in v14 into one. The compilation warning fixed > > by patch 6 was introduced by patch 5 in v14. > > - As with crashk_res, crashk_low_res is also processed by > > crash_exclude_mem_range() in patch 7. > > - Due to commit b261dba2fdb2 ("arm64: kdump: Remove custom linux,usable-memory-range handling") > > has removed the architecture-specific code, extend the property "linux,usable-memory-range" > > in the platform-agnostic FDT core code. See patch 9. > > - Discard the x86 description update in the document, because the description > > has been updated by commit b1f4c363666c ("Documentation: kdump: update kdump guide"). > > - Change "arm64" to "ARM64" in Doc. > > > > > > Changes since [v13] > > - Rebased on top of 5.11-rc5. > > - Introduce config CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL. > > Since reserve_crashkernel[_low]() implementations are quite similar on > > other architectures, so have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in > > arch/Kconfig and select this by X86 and ARM64. > > - Some minor cleanup. > > > > Changes since [v12] > > - Rebased on top of 5.10-rc1. > > - Keep CRASH_ALIGN as 16M suggested by Dave. > > - Drop patch "kdump: add threshold for the required memory". > > - Add Tested-by from John. > > > > Changes since [v11] > > - Rebased on top of 5.9-rc4. > > - Make the function reserve_crashkernel() of x86 generic. > > Suggested by Catalin, make the function reserve_crashkernel() of x86 generic > > and arm64 use the generic version to reimplement crashkernel=X. > > > > Changes since [v10] > > - Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin. > > > > Changes since [v9] > > - Patch 1 add Acked-by from Dave. > > - Update patch 5 according to Dave's comments. > > - Update chosen schema. > > > > Changes since [v8] > > - Reuse DT property "linux,usable-memory-range". > > Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low > > memory region. > > - Fix kdump broken with ZONE_DMA reintroduced. > > - Update chosen schema. > > > > Changes since [v7] > > - Move x86 CRASH_ALIGN to 2M > > Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M. > > - Update Documentation/devicetree/bindings/chosen.txt. > > Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt > > suggested by Arnd. > > - Add Tested-by from Jhon and pk. > > > > Changes since [v6] > > - Fix build errors reported by kbuild test robot. > > > > Changes since [v5] > > - Move reserve_crashkernel_low() into kernel/crash_core.c. > > - Delete crashkernel=X,high. > > - Modify crashkernel=X,low. > > If crashkernel=X,low is specified simultaneously, reserve spcified size low > > memory for crash kdump kernel devices firstly and then reserve memory above 4G. > > In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then > > pass to crash dump kernel by DT property "linux,low-memory-range". > > - Update Documentation/admin-guide/kdump/kdump.rst. > > > > Changes since [v4] > > - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike. > > > > Changes since [v3] > > - Add memblock_cap_memory_ranges back for multiple ranges. > > - Fix some compiling warnings. > > > > Changes since [v2] > > - Split patch "arm64: kdump: support reserving crashkernel above 4G" as > > two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate > > patch. > > > > Changes since [v1]: > > - Move common reserve_crashkernel_low() code into kernel/kexec_core.c. > > - Remove memblock_cap_memory_ranges() i added in v1 and implement that > > in fdt_enforce_memory_region(). > > There are at most two crash kernel regions, for two crash kernel regions > > case, we cap the memory range [min(regs[*].start), max(regs[*].end)] > > and then remove the memory range in the middle. > > > > v1: > > There are following issues in arm64 kdump: > > 1. We use crashkernel=X to reserve crashkernel below 4G, which > > will fail when there is no enough low memory. > > 2. If reserving crashkernel above 4G, in this case, crash dump > > kernel will boot failure because there is no low memory available > > for allocation. > > > > To solve these issues, change the behavior of crashkernel=X. > > crashkernel=X tries low allocation in DMA zone and fall back to high > > allocation if it fails. > > > > We can also use "crashkernel=X,high" to select a high region above > > DMA zone, which also tries to allocate at least 256M low memory in > > DMA zone automatically and "crashkernel=Y,low" can be used to allocate > > specified size low memory. > > > > When reserving crashkernel in high memory, some low memory is reserved > > for crash dump kernel devices. So there may be two regions reserved for > > crash dump kernel. > > In order to distinct from the high region and make no effect to the use > > of existing kexec-tools, rename the low region as "Crash kernel (low)", > > and pass the low region by reusing DT property > > "linux,usable-memory-range". We made the low memory region as the last > > range of "linux,usable-memory-range" to keep compatibility with existing > > user-space and older kdump kernels. > > > > Besides, we need to modify kexec-tools: > > arm64: support more than one crash kernel regions(see [1]) > > > > Another update is document about DT property 'linux,usable-memory-range': > > schemas: update 'linux,usable-memory-range' node schema(see [2]) > > > > > > [1]: https://www.spinics.net/lists/kexec/msg28226.html > > [2]: https://github.com/robherring/dt-schema/pull/19 > > [v1]: https://lkml.org/lkml/2019/4/2/1174 > > [v2]: https://lkml.org/lkml/2019/4/9/86 > > [v3]: https://lkml.org/lkml/2019/4/9/306 > > [v4]: https://lkml.org/lkml/2019/4/15/273 > > [v5]: https://lkml.org/lkml/2019/5/6/1360 > > [v6]: https://lkml.org/lkml/2019/8/30/142 > > [v7]: https://lkml.org/lkml/2019/12/23/411 > > [v8]: https://lkml.org/lkml/2020/5/21/213 > > [v9]: https://lkml.org/lkml/2020/6/28/73 > > [v10]: https://lkml.org/lkml/2020/7/2/1443 > > [v11]: https://lkml.org/lkml/2020/8/1/150 > > [v12]: https://lkml.org/lkml/2020/9/7/1037 > > [v13]: https://lkml.org/lkml/2020/10/31/34 > > [v14]: https://lkml.org/lkml/2021/1/30/53 > > [v15]: https://lkml.org/lkml/2021/10/19/1405 > > [v16]: https://lkml.org/lkml/2021/11/23/435 > > [v17]: https://lkml.org/lkml/2021/12/10/38 > > [v18]: https://lkml.org/lkml/2021/12/22/424 > > [v19]: https://lkml.org/lkml/2021/12/28/203 > > [v20]: https://lkml.org/lkml/2022/1/24/167 > > [v21]: https://lkml.org/lkml/2022/2/26/350 > > > > Chen Zhou (2): > > arm64: kdump: Reimplement crashkernel=X > > of: fdt: Add memory for devices by DT property > > "linux,usable-memory-range" > > > > Zhen Lei (7): > > kdump: return -ENOENT if required cmdline option does not exist > > arm64: Use insert_resource() to simplify code > > arm64: kdump: Remove some redundant checks in map_mem() > > arm64: kdump: Don't force page-level mappings for memory above 4G > > arm64: kdump: Use page-level mapping for the high memory of > > crashkernel > > arm64: kdump: Try not to use NO_BLOCK_MAPPINGS for memory under 4G > > docs: kdump: Update the crashkernel description for arm64 > > > > .../admin-guide/kernel-parameters.txt | 8 +- > > arch/arm64/include/asm/kexec.h | 2 + > > arch/arm64/kernel/machine_kexec.c | 9 +- > > arch/arm64/kernel/machine_kexec_file.c | 12 +- > > arch/arm64/kernel/setup.c | 17 +- > > arch/arm64/mm/init.c | 204 +++++++++++++++++- > > arch/arm64/mm/mmu.c | 68 +++--- > > drivers/of/fdt.c | 33 ++- > > kernel/crash_core.c | 3 +- > > 9 files changed, 278 insertions(+), 78 deletions(-) > > > > -- > Regards, > Zhen Lei >
On 04/14/22 at 07:57pm, Zhen Lei wrote: > According to the current crashkernel=Y,low support in other ARCHes, it's > an optional command-line option. When it doesn't exist, kernel will try > to allocate minimum required memory below 4G automatically. > > However, __parse_crashkernel() returns '-EINVAL' for all error cases. It > can't distinguish the nonexistent option from invalid option. > > Change __parse_crashkernel() to return '-ENOENT' for the nonexistent option > case. With this change, crashkernel,low memory will take the default > value if crashkernel=,low is not specified; while crashkernel reservation > will fail and bail out if an invalid option is specified. > > Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Acked-by: Baoquan He <bhe@redhat.com> > --- > kernel/crash_core.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/kernel/crash_core.c b/kernel/crash_core.c > index 256cf6db573cd09..4d57c03714f4e13 100644 > --- a/kernel/crash_core.c > +++ b/kernel/crash_core.c > @@ -243,9 +243,8 @@ static int __init __parse_crashkernel(char *cmdline, > *crash_base = 0; > > ck_cmdline = get_last_crashkernel(cmdline, name, suffix); > - > if (!ck_cmdline) > - return -EINVAL; > + return -ENOENT; > > ck_cmdline += strlen(name); > > -- > 2.25.1 >
On 2022/4/25 10:45, Baoquan He wrote: > On 04/25/22 at 10:19am, Leizhen (ThunderTown) wrote: >> >> Hi, Catalin Marinas and Will Deacon: >> Do you have time to review patches 3-4 and 6-7? These patches are a little more >> relevant to the arm64 architecture. >> >> Hi, Baoquan He: >> Can you review patches 1 and 5 again? > > Yes, and I will test and review the whole series if no one check the > arm64 related changes. Okay, thank you so much. > >> >>
On Thu, Apr 14, 2022 at 07:57:15PM +0800, Zhen Lei wrote: > @@ -540,13 +540,31 @@ static void __init map_mem(pgd_t *pgdp) > for_each_mem_range(i, &start, &end) { > if (start >= end) > break; > + > +#ifdef CONFIG_KEXEC_CORE > + if (eflags && (end >= SZ_4G)) { > + /* > + * The memory block cross the 4G boundary. > + * Forcibly use page-level mappings for memory under 4G. > + */ > + if (start < SZ_4G) { > + __map_memblock(pgdp, start, SZ_4G - 1, > + pgprot_tagged(PAGE_KERNEL), flags | eflags); > + start = SZ_4G; > + } > + > + /* Page-level mappings is not mandatory for memory above 4G */ > + eflags = 0; > + } > +#endif That's a bit tricky if a SoC has all RAM above 4G. IIRC AMD Seattle had this layout. See max_zone_phys() for how we deal with this, basically extending ZONE_DMA to the whole range if RAM starts above 4GB. In that case, crashkernel reservation would fall in the range above 4GB. BTW, we changed the max_zone_phys() logic with commit 791ab8b2e3db ("arm64: Ignore any DMA offsets in the max_zone_phys() calculation").
On Thu, Apr 14, 2022 at 07:57:16PM +0800, Zhen Lei wrote: > /* > * reserve_crashkernel() - reserves memory for crash kernel > * > * This function reserves memory area given in "crashkernel=" kernel command > * line parameter. The memory reserved is used by dump capture kernel when > * primary kernel is crashing. > + * > + * NOTE: Reservation of crashkernel,low is special since its existence > + * is not independent, need rely on the existence of crashkernel,high. > + * Here, four cases of crashkernel low memory reservation are summarized: > + * 1) crashkernel=Y,low is specified explicitly, the size of crashkernel low > + * memory takes Y; > + * 2) crashkernel=,low is not given, while crashkernel=,high is specified, > + * take the default crashkernel low memory size; > + * 3) crashkernel=X is specified, while fallback to get a memory region > + * in high memory, take the default crashkernel low memory size; > + * 4) crashkernel='invalid value',low is specified, failed the whole > + * crashkernel reservation and bail out. Following the x86 behaviour made sense when we were tried to get that code generic. Now that we moved the logic under arch/arm64, we can diverge a bit. I lost track of the original (v1/v2) proposal but I wonder whether we still need the fallback to high for crashkernel=Y. Maybe simpler, no fallbacks: crashkernel=Y - keep the current behaviour, ignore high,low crashkernel=Y,high - allocate above ZONE_DMA crashkernel=Y,low - allocate within ZONE_DMA From your proposal, the difference is that the Y,high option won't have any default ZONE_DMA fallback, one would have to explicitly pass the Y,low option if needed. Just a thought, maybe it makes the code simpler. But I'm open to discussion if there are good arguments for the proposed (x86-like) behaviour. One argument could be for crashkernel=Y to fall back to high if distros don't want to bother with high/low settings. Another thing I may have asked in the past, what happens if we run a new kernel with these patches with old kexec user tools. I suspect the crashkernel=Y with the fallback to high will confuse the tools. BTW, please separate the NO_BLOCK_MAPPINGS optimisations from the crashkernel above 4G. Let's get the crashkernel reservations sorted first, it's been around for too long. Thanks.
On 2022/4/27 2:02, Catalin Marinas wrote: > On Thu, Apr 14, 2022 at 07:57:16PM +0800, Zhen Lei wrote: >> /* >> * reserve_crashkernel() - reserves memory for crash kernel >> * >> * This function reserves memory area given in "crashkernel=" kernel command >> * line parameter. The memory reserved is used by dump capture kernel when >> * primary kernel is crashing. >> + * >> + * NOTE: Reservation of crashkernel,low is special since its existence >> + * is not independent, need rely on the existence of crashkernel,high. >> + * Here, four cases of crashkernel low memory reservation are summarized: >> + * 1) crashkernel=Y,low is specified explicitly, the size of crashkernel low >> + * memory takes Y; >> + * 2) crashkernel=,low is not given, while crashkernel=,high is specified, >> + * take the default crashkernel low memory size; >> + * 3) crashkernel=X is specified, while fallback to get a memory region >> + * in high memory, take the default crashkernel low memory size; >> + * 4) crashkernel='invalid value',low is specified, failed the whole >> + * crashkernel reservation and bail out. > > Following the x86 behaviour made sense when we were tried to get that > code generic. Now that we moved the logic under arch/arm64, we can > diverge a bit. I lost track of the original (v1/v2) proposal but I > wonder whether we still need the fallback to high for crashkernel=Y. I don't think anyone has raised this demand yet! If it weren't for the fact that crashkernel=X appeared earlier, it would probably have been enough for a combination of crashkernel=X,high and crashkernel=Y,low. In fact, I also tend not to support "fallback to high for crashkernel=Y". I took over this from Chen Zhou. In the absence of any objection, I had to inherit. Now that you've brought it up, I'm happy to delete it. Supporting this feature complicates the code logic a lot. The point is, it's not fully backwards compatible yet. For example, someone may want crashkernel=3G to report failure, but the the new support make it work. > Maybe simpler, no fallbacks: > > crashkernel=Y - keep the current behaviour, ignore high,low > crashkernel=Y,high - allocate above ZONE_DMA > crashkernel=Y,low - allocate within ZONE_DMA > >>From your proposal, the difference is that the Y,high option won't > have any default ZONE_DMA fallback, one would have to explicitly pass > the Y,low option if needed. I agree with you. Now we don't need code generic, so there is no need to carry the historical burden of other ARCHs. arm64 does not need to delve into that empirical value(the default size of crash low memory). > > Just a thought, maybe it makes the code simpler. But I'm open to > discussion if there are good arguments for the proposed (x86-like) > behaviour. One argument could be for crashkernel=Y to fall back to high > if distros don't want to bother with high/low settings. I think distros should take precedence over "crashkernel=Y,high". After all, ZONE_DMA memory is more valuable than high memory. > > Another thing I may have asked in the past, what happens if we run a new > kernel with these patches with old kexec user tools. I suspect the > crashkernel=Y with the fallback to high will confuse the tools. If crashkernel=Y can reserve the memory in Zone_DMA successfully, the old kexec works well. But if crashkernel=Y fallback to high memory, the second kernel will boot failed, because the old kexec can only use dtb to pass the high memory range to the second kernel. In comparison, if no fallback, we can see crash memory reservation failure in the first kernel, so we have a chance to adjust Y. Currently, the new kexec tool will pick the last memory range(sorted by address in ascending order) to store Image,dtb,initrd. > > BTW, please separate the NO_BLOCK_MAPPINGS optimisations from the > crashkernel above 4G. Let's get the crashkernel reservations sorted > first, it's been around for too long. OK, thank you. That's a good suggestion. > > Thanks. >
On 2022/4/26 22:26, Catalin Marinas wrote: > On Thu, Apr 14, 2022 at 07:57:15PM +0800, Zhen Lei wrote: >> @@ -540,13 +540,31 @@ static void __init map_mem(pgd_t *pgdp) >> for_each_mem_range(i, &start, &end) { >> if (start >= end) >> break; >> + >> +#ifdef CONFIG_KEXEC_CORE >> + if (eflags && (end >= SZ_4G)) { >> + /* >> + * The memory block cross the 4G boundary. >> + * Forcibly use page-level mappings for memory under 4G. >> + */ >> + if (start < SZ_4G) { >> + __map_memblock(pgdp, start, SZ_4G - 1, >> + pgprot_tagged(PAGE_KERNEL), flags | eflags); >> + start = SZ_4G; >> + } >> + >> + /* Page-level mappings is not mandatory for memory above 4G */ >> + eflags = 0; >> + } >> +#endif > > That's a bit tricky if a SoC has all RAM above 4G. IIRC AMD Seattle had > this layout. See max_zone_phys() for how we deal with this, basically > extending ZONE_DMA to the whole range if RAM starts above 4GB. In that > case, crashkernel reservation would fall in the range above 4GB. > > BTW, we changed the max_zone_phys() logic with commit 791ab8b2e3db > ("arm64: Ignore any DMA offsets in the max_zone_phys() calculation"). Okay, thanks for your correction. I'll dig into it after I've done the original requirement. >
On Wed, Apr 27, 2022 at 02:54:52PM +0800, Leizhen (ThunderTown) wrote: > On 2022/4/27 2:02, Catalin Marinas wrote: > > On Thu, Apr 14, 2022 at 07:57:16PM +0800, Zhen Lei wrote: > >> /* > >> * reserve_crashkernel() - reserves memory for crash kernel > >> * > >> * This function reserves memory area given in "crashkernel=" kernel command > >> * line parameter. The memory reserved is used by dump capture kernel when > >> * primary kernel is crashing. > >> + * > >> + * NOTE: Reservation of crashkernel,low is special since its existence > >> + * is not independent, need rely on the existence of crashkernel,high. > >> + * Here, four cases of crashkernel low memory reservation are summarized: > >> + * 1) crashkernel=Y,low is specified explicitly, the size of crashkernel low > >> + * memory takes Y; > >> + * 2) crashkernel=,low is not given, while crashkernel=,high is specified, > >> + * take the default crashkernel low memory size; > >> + * 3) crashkernel=X is specified, while fallback to get a memory region > >> + * in high memory, take the default crashkernel low memory size; > >> + * 4) crashkernel='invalid value',low is specified, failed the whole > >> + * crashkernel reservation and bail out. > > > > Following the x86 behaviour made sense when we were tried to get that > > code generic. Now that we moved the logic under arch/arm64, we can > > diverge a bit. I lost track of the original (v1/v2) proposal but I > > wonder whether we still need the fallback to high for crashkernel=Y. > > I don't think anyone has raised this demand yet! If it weren't for the > fact that crashkernel=X appeared earlier, it would probably have been > enough for a combination of crashkernel=X,high and crashkernel=Y,low. > > In fact, I also tend not to support "fallback to high for crashkernel=Y". > I took over this from Chen Zhou. In the absence of any objection, I had > to inherit. Now that you've brought it up, I'm happy to delete it. > Supporting this feature complicates the code logic a lot. The point is, > it's not fully backwards compatible yet. For example, someone may want > crashkernel=3G to report failure, but the the new support make it work. BTW, prior to v20, this patch had this line: crashk_low_res.name = "Crash kernel (low)"; I can't find it anymore. Do the kexec tools need to distinguish between low and high or they can cope with multiple "Crash kernel" entries? > > Maybe simpler, no fallbacks: > > > > crashkernel=Y - keep the current behaviour, ignore high,low > > crashkernel=Y,high - allocate above ZONE_DMA > > crashkernel=Y,low - allocate within ZONE_DMA > > > > From your proposal, the difference is that the Y,high option won't > > have any default ZONE_DMA fallback, one would have to explicitly pass > > the Y,low option if needed. > > I agree with you. Now we don't need code generic, so there is no need to > carry the historical burden of other ARCHs. arm64 does not need to delve > into that empirical value(the default size of crash low memory). > > > Just a thought, maybe it makes the code simpler. But I'm open to > > discussion if there are good arguments for the proposed (x86-like) > > behaviour. One argument could be for crashkernel=Y to fall back to high > > if distros don't want to bother with high/low settings. > > I think distros should take precedence over "crashkernel=Y,high". After all, > ZONE_DMA memory is more valuable than high memory. My point is whether an admin configuring the kernel command line needs to know the layout of ZONE_DMA etc. to figure out how much to pass in high and low. The fallbacks in this case have some value but they also complicate the code logic. The 4GB limit does not always make sense either for some platforms (RPi4 has a ZONE_DMA limit of 1GB). I think one could always pass a default command line like: crashkernel=1G,high crashkernel=128M,low without much knowledge of the SoC memory layout. Another option is to only introduce crashkernel=Y,low and, when that is passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a 'high' option at all: crashkernel=1G - all within ZONE_DMA crashkernel=1G crashkernel=128M,low - 128M in ZONE_DMA 1G above ZONE_DMA If ZONE_DMA is not present or it extends to the whole RAM, we can ignore the 'low' option.
On 2022/4/27 20:32, Catalin Marinas wrote: > On Wed, Apr 27, 2022 at 02:54:52PM +0800, Leizhen (ThunderTown) wrote: >> On 2022/4/27 2:02, Catalin Marinas wrote: >>> On Thu, Apr 14, 2022 at 07:57:16PM +0800, Zhen Lei wrote: >>>> /* >>>> * reserve_crashkernel() - reserves memory for crash kernel >>>> * >>>> * This function reserves memory area given in "crashkernel=" kernel command >>>> * line parameter. The memory reserved is used by dump capture kernel when >>>> * primary kernel is crashing. >>>> + * >>>> + * NOTE: Reservation of crashkernel,low is special since its existence >>>> + * is not independent, need rely on the existence of crashkernel,high. >>>> + * Here, four cases of crashkernel low memory reservation are summarized: >>>> + * 1) crashkernel=Y,low is specified explicitly, the size of crashkernel low >>>> + * memory takes Y; >>>> + * 2) crashkernel=,low is not given, while crashkernel=,high is specified, >>>> + * take the default crashkernel low memory size; >>>> + * 3) crashkernel=X is specified, while fallback to get a memory region >>>> + * in high memory, take the default crashkernel low memory size; >>>> + * 4) crashkernel='invalid value',low is specified, failed the whole >>>> + * crashkernel reservation and bail out. >>> >>> Following the x86 behaviour made sense when we were tried to get that >>> code generic. Now that we moved the logic under arch/arm64, we can >>> diverge a bit. I lost track of the original (v1/v2) proposal but I >>> wonder whether we still need the fallback to high for crashkernel=Y. >> >> I don't think anyone has raised this demand yet! If it weren't for the >> fact that crashkernel=X appeared earlier, it would probably have been >> enough for a combination of crashkernel=X,high and crashkernel=Y,low. >> >> In fact, I also tend not to support "fallback to high for crashkernel=Y". >> I took over this from Chen Zhou. In the absence of any objection, I had >> to inherit. Now that you've brought it up, I'm happy to delete it. >> Supporting this feature complicates the code logic a lot. The point is, >> it's not fully backwards compatible yet. For example, someone may want >> crashkernel=3G to report failure, but the the new support make it work. > > BTW, prior to v20, this patch had this line: > > crashk_low_res.name = "Crash kernel (low)"; > > I can't find it anymore. Do the kexec tools need to distinguish between > low and high or they can cope with multiple "Crash kernel" entries? Yes, I've updated the kexec tool patch based on Borislav Petkov's comments to keep it the same as x86. And this patch has been merged into kexec v2.0.24. b5a34a2 arm64: support more than one crash kernel regions The kexec tool will first sorted all "Crash kernel" memory range in ascending order, then choose the last one (the memory range with the highest address) to store the second kernel's Image,dtb,initrd. > >>> Maybe simpler, no fallbacks: >>> >>> crashkernel=Y - keep the current behaviour, ignore high,low >>> crashkernel=Y,high - allocate above ZONE_DMA >>> crashkernel=Y,low - allocate within ZONE_DMA >>> >>> From your proposal, the difference is that the Y,high option won't >>> have any default ZONE_DMA fallback, one would have to explicitly pass >>> the Y,low option if needed. >> >> I agree with you. Now we don't need code generic, so there is no need to >> carry the historical burden of other ARCHs. arm64 does not need to delve >> into that empirical value(the default size of crash low memory). >> >>> Just a thought, maybe it makes the code simpler. But I'm open to >>> discussion if there are good arguments for the proposed (x86-like) >>> behaviour. One argument could be for crashkernel=Y to fall back to high >>> if distros don't want to bother with high/low settings. >> >> I think distros should take precedence over "crashkernel=Y,high". After all, >> ZONE_DMA memory is more valuable than high memory. > > My point is whether an admin configuring the kernel command line needs > to know the layout of ZONE_DMA etc. to figure out how much to pass in No need. > high and low. The fallbacks in this case have some value but they also > complicate the code logic. The 4GB limit does not always make sense > either for some platforms (RPi4 has a ZONE_DMA limit of 1GB). > > I think one could always pass a default command line like: > > crashkernel=1G,high crashkernel=128M,low > > without much knowledge of the SoC memory layout. Yes, that's what the end result is. The user specify crashkernel=128M,low and the implementation ensure the 128M low memory is allocated from DMA zone. We use arm64_dma_phys_limit as the upper limit for crash low memory. +#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit + unsigned long long crash_max = CRASH_ADDR_LOW_MAX; + crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, crash_base, crash_max); > > Another option is to only introduce crashkernel=Y,low and, when that is > passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a > 'high' option at all: > > crashkernel=1G - all within ZONE_DMA > crashkernel=1G crashkernel=128M,low - 128M in ZONE_DMA > 1G above ZONE_DMA > > If ZONE_DMA is not present or it extends to the whole RAM, we can ignore > the 'low' option. I think although the code is hard to make generic, the interface is better to be relatively uniform. A user might have to maintain both x86 and arm64, and so on. It's not a good thing that the difference is too big. A well had already been dug by the forefathers, and it seemed unnecessary to dig another well beside it if there was no particular advantage. >
On Wed, Apr 27, 2022 at 09:49:20PM +0800, Leizhen (ThunderTown) wrote: > On 2022/4/27 20:32, Catalin Marinas wrote: > > I think one could always pass a default command line like: > > > > crashkernel=1G,high crashkernel=128M,low > > > > without much knowledge of the SoC memory layout. > > Yes, that's what the end result is. The user specify crashkernel=128M,low > and the implementation ensure the 128M low memory is allocated from DMA zone. > We use arm64_dma_phys_limit as the upper limit for crash low memory. > > +#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit > + unsigned long long crash_max = CRASH_ADDR_LOW_MAX; > + crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, > crash_base, crash_max); > > > Another option is to only introduce crashkernel=Y,low and, when that is > > passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a > > 'high' option at all: > > > > crashkernel=1G - all within ZONE_DMA > > crashkernel=1G crashkernel=128M,low - 128M in ZONE_DMA > > 1G above ZONE_DMA > > > > If ZONE_DMA is not present or it extends to the whole RAM, we can ignore > > the 'low' option. > > I think although the code is hard to make generic, the interface is better to > be relatively uniform. A user might have to maintain both x86 and arm64, and > so on. It's not a good thing that the difference is too big. There will be some difference as the 4G limit doesn't always hold for arm64 (though it's true in most cases). Anyway, we can probably simplify things a bit while following the documented behaviour: crashkernel=Y - current behaviour within ZONE_DMA crashkernel=Y,high - allocate from above ZONE_DMA crashkernel=Y,low - allocate within ZONE_DMA There is no fallback from crashkernel=Y. The question is whether we still want a default low allocation if crashkernel=Y,low is missing but 'high' is present. If we add this, I think we'd be consistent with kernel-parameters.txt for the 'low' description. A default 'low' is probably not that bad but I'm tempted to always mandate both 'high' and 'low'.
On 2022/4/28 0:04, Catalin Marinas wrote: > On Wed, Apr 27, 2022 at 09:49:20PM +0800, Leizhen (ThunderTown) wrote: >> On 2022/4/27 20:32, Catalin Marinas wrote: >>> I think one could always pass a default command line like: >>> >>> crashkernel=1G,high crashkernel=128M,low >>> >>> without much knowledge of the SoC memory layout. >> >> Yes, that's what the end result is. The user specify crashkernel=128M,low >> and the implementation ensure the 128M low memory is allocated from DMA zone. >> We use arm64_dma_phys_limit as the upper limit for crash low memory. >> >> +#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit >> + unsigned long long crash_max = CRASH_ADDR_LOW_MAX; >> + crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, >> crash_base, crash_max); >> >>> Another option is to only introduce crashkernel=Y,low and, when that is >>> passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a >>> 'high' option at all: >>> >>> crashkernel=1G - all within ZONE_DMA >>> crashkernel=1G crashkernel=128M,low - 128M in ZONE_DMA >>> 1G above ZONE_DMA >>> >>> If ZONE_DMA is not present or it extends to the whole RAM, we can ignore >>> the 'low' option. >> >> I think although the code is hard to make generic, the interface is better to >> be relatively uniform. A user might have to maintain both x86 and arm64, and >> so on. It's not a good thing that the difference is too big. > > There will be some difference as the 4G limit doesn't always hold for > arm64 (though it's true in most cases). Anyway, we can probably simplify > things a bit while following the documented behaviour: > > crashkernel=Y - current behaviour within ZONE_DMA > crashkernel=Y,high - allocate from above ZONE_DMA > crashkernel=Y,low - allocate within ZONE_DMA > > There is no fallback from crashkernel=Y. Yes, I followed your guidelines yesterday to modify the code. Now the code flow is much clearer. > > The question is whether we still want a default low allocation if > crashkernel=Y,low is missing but 'high' is present. If we add this, I > think we'd be consistent with kernel-parameters.txt for the 'low' > description. A default 'low' is probably not that bad but I'm tempted to > always mandate both 'high' and 'low'. Yes, I agree with you. Because the situation is complicated, the default value is hard to be accurate. It's better to let the user configure it according to the actual situation, they're also programmers. Whether mandate both 'high' and 'low', or allow only 'high' like x86(but the default value becomes zero). I prefer the latter. The size of 'low' maybe zero, for example, SMMU is enabled on the second kernel. If only high memory is required, only that high memory needs to be configured, seems more reasonable. >
Hi Catalin, Zhen Lei, On 04/27/22 at 05:04pm, Catalin Marinas wrote: > On Wed, Apr 27, 2022 at 09:49:20PM +0800, Leizhen (ThunderTown) wrote: > > On 2022/4/27 20:32, Catalin Marinas wrote: > > > I think one could always pass a default command line like: > > > > > > crashkernel=1G,high crashkernel=128M,low > > > > > > without much knowledge of the SoC memory layout. > > > > Yes, that's what the end result is. The user specify crashkernel=128M,low > > and the implementation ensure the 128M low memory is allocated from DMA zone. > > We use arm64_dma_phys_limit as the upper limit for crash low memory. > > > > +#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit > > + unsigned long long crash_max = CRASH_ADDR_LOW_MAX; > > + crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, > > crash_base, crash_max); > > > > > Another option is to only introduce crashkernel=Y,low and, when that is > > > passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a > > > 'high' option at all: > > > > > > crashkernel=1G - all within ZONE_DMA > > > crashkernel=1G crashkernel=128M,low - 128M in ZONE_DMA > > > 1G above ZONE_DMA > > > > > > If ZONE_DMA is not present or it extends to the whole RAM, we can ignore > > > the 'low' option. > > > > I think although the code is hard to make generic, the interface is better to > > be relatively uniform. A user might have to maintain both x86 and arm64, and > > so on. It's not a good thing that the difference is too big. > > There will be some difference as the 4G limit doesn't always hold for > arm64 (though it's true in most cases). Anyway, we can probably simplify > things a bit while following the documented behaviour: > > crashkernel=Y - current behaviour within ZONE_DMA > crashkernel=Y,high - allocate from above ZONE_DMA > crashkernel=Y,low - allocate within ZONE_DMA > > There is no fallback from crashkernel=Y. > > The question is whether we still want a default low allocation if > crashkernel=Y,low is missing but 'high' is present. If we add this, I > think we'd be consistent with kernel-parameters.txt for the 'low' > description. A default 'low' is probably not that bad but I'm tempted to > always mandate both 'high' and 'low'. Sorry to interrupt. Seems the ,high ,low and fallback are main concerns about this version. And I have the same concerns about them which comes from below points: 1) we may need to take best effort to keep ,high, ,low behaviour consistent on all ARCHes. Otherwise user/admin may be confused when they deploy/configure kdump on different machines of different ARCHes in the same LAB. I think we should try to avoid the confusion. 2) Fallback behaviour is important to our distros. The reason is we will provide default value with crashkernel=xxxM along kernel of distros. In this case, we hope the reservation will succeed by all means. The ,high and ,low is an option if customer likes to take with expertise. After going through arm64 memory init code, I got below summary about arm64_dma_phys_limit which is the first zone's upper limit. I think we can make use of it to facilitate to simplify code. ================================================================================ DMA DMA32 NORMAL 1)Raspberry Pi4 0~1G 3G~4G (above 4G) 2)Normal machine 0~4G 0 (above 4G) 3)Special machine (above 4G)~MAX 4)No DMA|DMA32 (above 4G)~MAX ------------------------------------------- arm64_dma_phys_limit 1)Raspberry Pi4 1G 2)Normal machine 4G 3)Special machine MAX 4)No DMA|DMA32 MAX Note: 3)Special machine means the machine's starting physical address is above 4G. WHile 4)No DMA|DMA32 means kernel w/o CONFIG_ZONE_DMA|DMA32, and has IOMMU hardware supporting. =================================================================================== I made a draft patch based on this patchset, please feel free to check and see if it's OK, or anything missing or wrongly understood. I removed reserve_crashkernel_high() and only keep reserve_crashkernel() and reserve_crashkernel_low() as the v21 did. Thanks Baoquan
On 2022/4/28 11:52, Baoquan He wrote: > On 04/28/22 at 11:40am, Baoquan He wrote: >> Hi Catalin, Zhen Lei, >> >> On 04/27/22 at 05:04pm, Catalin Marinas wrote: >>> On Wed, Apr 27, 2022 at 09:49:20PM +0800, Leizhen (ThunderTown) wrote: >>>> On 2022/4/27 20:32, Catalin Marinas wrote: >>>>> I think one could always pass a default command line like: >>>>> >>>>> crashkernel=1G,high crashkernel=128M,low >>>>> >>>>> without much knowledge of the SoC memory layout. >>>> >>>> Yes, that's what the end result is. The user specify crashkernel=128M,low >>>> and the implementation ensure the 128M low memory is allocated from DMA zone. >>>> We use arm64_dma_phys_limit as the upper limit for crash low memory. >>>> >>>> +#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit >>>> + unsigned long long crash_max = CRASH_ADDR_LOW_MAX; >>>> + crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, >>>> crash_base, crash_max); >>>> >>>>> Another option is to only introduce crashkernel=Y,low and, when that is >>>>> passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a >>>>> 'high' option at all: >>>>> >>>>> crashkernel=1G - all within ZONE_DMA >>>>> crashkernel=1G crashkernel=128M,low - 128M in ZONE_DMA >>>>> 1G above ZONE_DMA >>>>> >>>>> If ZONE_DMA is not present or it extends to the whole RAM, we can ignore >>>>> the 'low' option. >>>> >>>> I think although the code is hard to make generic, the interface is better to >>>> be relatively uniform. A user might have to maintain both x86 and arm64, and >>>> so on. It's not a good thing that the difference is too big. >>> >>> There will be some difference as the 4G limit doesn't always hold for >>> arm64 (though it's true in most cases). Anyway, we can probably simplify >>> things a bit while following the documented behaviour: >>> >>> crashkernel=Y - current behaviour within ZONE_DMA >>> crashkernel=Y,high - allocate from above ZONE_DMA >>> crashkernel=Y,low - allocate within ZONE_DMA >>> >>> There is no fallback from crashkernel=Y. >>> >>> The question is whether we still want a default low allocation if >>> crashkernel=Y,low is missing but 'high' is present. If we add this, I >>> think we'd be consistent with kernel-parameters.txt for the 'low' >>> description. A default 'low' is probably not that bad but I'm tempted to >>> always mandate both 'high' and 'low'. >> >> Sorry to interrupt. Seems the ,high ,low and fallback are main concerns >> about this version. And I have the same concerns about them which comes >> from below points: >> 1) we may need to take best effort to keep ,high, ,low behaviour >> consistent on all ARCHes. Otherwise user/admin may be confused when they >> deploy/configure kdump on different machines of different ARCHes in the >> same LAB. I think we should try to avoid the confusion. Yes, but for someone who is configuring crashkernel= for the first time, he needs to read doc to understand how to configure it. The doc can show the recommended default value of 'low' size. After commit 94fb93341822 ("x86/crash: Allocate enough low memory when crashkernel=high"), the default 'low' size doesn't make much sense anymore. The default size of swiotlb_size() is 64M, far less than 256M. And if user specify "swiotlb=", he can also adjust crashkernel=Y,low. + * -swiotlb size: user-specified with swiotlb= or default. - low_size = swiotlb_size_or_default() + (8UL<<20); + low_size = max(swiotlb_size_or_default() + (8UL<<20), 256UL<<20); That means all ARCHs can explicit configure crashkernel=256M,low, instead of omitting it. This may be another way to avoid confusion. It's not hard for programmer-turned-user/admin. However, this requires us to forgo backward compatibility with the default size of 'low'. >> 2) Fallback behaviour is important to our distros. The reason is we will >> provide default value with crashkernel=xxxM along kernel of distros. In >> this case, we hope the reservation will succeed by all means. The ,high >> and ,low is an option if customer likes to take with expertise. OK, I got it. >> >> After going through arm64 memory init code, I got below summary about >> arm64_dma_phys_limit which is the first zone's upper limit. I think we >> can make use of it to facilitate to simplify code. >> ================================================================================ >> DMA DMA32 NORMAL >> 1)Raspberry Pi4 0~1G 3G~4G (above 4G) >> 2)Normal machine 0~4G 0 (above 4G) >> 3)Special machine (above 4G)~MAX >> 4)No DMA|DMA32 (above 4G)~MAX arm64_memblock_init() reserve_crashkernel() --------------- 0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into mem_init()") paging_init() | map_mem() | unflatten_device_tree or ACPI | ---- //Raspberry Pi4 get dma zone base on dtb or ACPI bootmem_init(); | | zone_sizes_init() | | of_dma_get_max_cpu_address | ----| //Update arm64_dma_phys_limit | ----| reserve_crashkernel() <-------------- //Because we need arm64_dma_phys_limit to be updated above request_standard_resources() >> >> ------------------------------------------- >> arm64_dma_phys_limit >> 1)Raspberry Pi4 1G >> 2)Normal machine 4G >> 3)Special machine MAX >> 4)No DMA|DMA32 MAX >> >> Note: 3)Special machine means the machine's starting physical address is above 4G. >> WHile 4)No DMA|DMA32 means kernel w/o CONFIG_ZONE_DMA|DMA32, and has >> IOMMU hardware supporting. >> =================================================================================== >> >> I made a draft patch based on this patchset, please feel free to check and >> see if it's OK, or anything missing or wrongly understood. I removed >> reserve_crashkernel_high() and only keep reserve_crashkernel() and >> reserve_crashkernel_low() as the v21 did. > > Sorry, forgot attaching the draft patch. > > By the way, we can also have a simple version with basic ,high, ,low > support, no fallback. We can add fallback and other optimization later. > This can be plan B. Yes, That's what Catalin suggested also. Hi, Baoquan He: Without optimization, the whole Patch 3-4 and 6-7 can be dropped. Process after abstraction: if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) { reserve_crashkernel() //block mapping } else { //page mapping reserve_crashkernel() } ------------ Simplified real-world process --------- arm64_memblock_init() if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) reserve_crashkernel() paging_init() map_mem() if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) //block mapping else //page mapping unflatten_device_tree or ACPI bootmem_init(); zone_sizes_init() of_dma_get_max_cpu_address //Update arm64_dma_phys_limit if (IS_ENABLED(CONFIG_ZONE_DMA) || IS_ENABLED(CONFIG_ZONE_DMA32)) reserve_crashkernel() > >
On 04/28/22 at 05:33pm, Leizhen (ThunderTown) wrote: > > > On 2022/4/28 11:52, Baoquan He wrote: > > On 04/28/22 at 11:40am, Baoquan He wrote: > >> Hi Catalin, Zhen Lei, > >> > >> On 04/27/22 at 05:04pm, Catalin Marinas wrote: > >>> On Wed, Apr 27, 2022 at 09:49:20PM +0800, Leizhen (ThunderTown) wrote: > >>>> On 2022/4/27 20:32, Catalin Marinas wrote: > >>>>> I think one could always pass a default command line like: > >>>>> > >>>>> crashkernel=1G,high crashkernel=128M,low > >>>>> > >>>>> without much knowledge of the SoC memory layout. > >>>> > >>>> Yes, that's what the end result is. The user specify crashkernel=128M,low > >>>> and the implementation ensure the 128M low memory is allocated from DMA zone. > >>>> We use arm64_dma_phys_limit as the upper limit for crash low memory. > >>>> > >>>> +#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit > >>>> + unsigned long long crash_max = CRASH_ADDR_LOW_MAX; > >>>> + crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, > >>>> crash_base, crash_max); > >>>> > >>>>> Another option is to only introduce crashkernel=Y,low and, when that is > >>>>> passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a > >>>>> 'high' option at all: > >>>>> > >>>>> crashkernel=1G - all within ZONE_DMA > >>>>> crashkernel=1G crashkernel=128M,low - 128M in ZONE_DMA > >>>>> 1G above ZONE_DMA > >>>>> > >>>>> If ZONE_DMA is not present or it extends to the whole RAM, we can ignore > >>>>> the 'low' option. > >>>> > >>>> I think although the code is hard to make generic, the interface is better to > >>>> be relatively uniform. A user might have to maintain both x86 and arm64, and > >>>> so on. It's not a good thing that the difference is too big. > >>> > >>> There will be some difference as the 4G limit doesn't always hold for > >>> arm64 (though it's true in most cases). Anyway, we can probably simplify > >>> things a bit while following the documented behaviour: > >>> > >>> crashkernel=Y - current behaviour within ZONE_DMA > >>> crashkernel=Y,high - allocate from above ZONE_DMA > >>> crashkernel=Y,low - allocate within ZONE_DMA > >>> > >>> There is no fallback from crashkernel=Y. > >>> > >>> The question is whether we still want a default low allocation if > >>> crashkernel=Y,low is missing but 'high' is present. If we add this, I > >>> think we'd be consistent with kernel-parameters.txt for the 'low' > >>> description. A default 'low' is probably not that bad but I'm tempted to > >>> always mandate both 'high' and 'low'. > >> > >> Sorry to interrupt. Seems the ,high ,low and fallback are main concerns > >> about this version. And I have the same concerns about them which comes > >> from below points: > >> 1) we may need to take best effort to keep ,high, ,low behaviour > >> consistent on all ARCHes. Otherwise user/admin may be confused when they > >> deploy/configure kdump on different machines of different ARCHes in the > >> same LAB. I think we should try to avoid the confusion. > > Yes, but for someone who is configuring crashkernel= for the first time, he > needs to read doc to understand how to configure it. The doc can show the > recommended default value of 'low' size. > > After commit 94fb93341822 ("x86/crash: Allocate enough low memory when crashkernel=high"), > the default 'low' size doesn't make much sense anymore. The default size of swiotlb_size() > is 64M, far less than 256M. And if user specify "swiotlb=", he can also adjust crashkernel=Y,low. > > > + * -swiotlb size: user-specified with swiotlb= or default. > - low_size = swiotlb_size_or_default() + (8UL<<20); > + low_size = max(swiotlb_size_or_default() + (8UL<<20), 256UL<<20); > > That means all ARCHs can explicit configure crashkernel=256M,low, instead of > omitting it. This may be another way to avoid confusion. It's not hard for > programmer-turned-user/admin. However, this requires us to forgo backward > compatibility with the default size of 'low'. We can make ,high and ,low simpler at first as they are alternative. If possible, we can also simplify the ,high ,low implementation on x86_64 if it truly brings better archievement on arm64. > > > >> 2) Fallback behaviour is important to our distros. The reason is we will > >> provide default value with crashkernel=xxxM along kernel of distros. In > >> this case, we hope the reservation will succeed by all means. The ,high > >> and ,low is an option if customer likes to take with expertise. > > OK, I got it. > > >> > >> After going through arm64 memory init code, I got below summary about > >> arm64_dma_phys_limit which is the first zone's upper limit. I think we > >> can make use of it to facilitate to simplify code. > >> ================================================================================ > >> DMA DMA32 NORMAL > >> 1)Raspberry Pi4 0~1G 3G~4G (above 4G) > >> 2)Normal machine 0~4G 0 (above 4G) > >> 3)Special machine (above 4G)~MAX > >> 4)No DMA|DMA32 (above 4G)~MAX > > arm64_memblock_init() > reserve_crashkernel() --------------- 0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into mem_init()") We don't need different code for this place of reservation as you are doing in this patchset, since arm64_dma_phys_limit is initialized as below. In fact, in arm64_memblock_init(), we have made memblock ready, we can initialize arm64_dma_phys_limit as memblock_end_of_DRAM(). And if memblock_start_of_DRAM() is bigger than 4G, we possibly can call reserve_crashkernel() here too. phys_addr_t __ro_after_init arm64_dma_phys_limit = PHYS_MASK + 1; > paging_init() | > map_mem() | > unflatten_device_tree or ACPI | ---- //Raspberry Pi4 get dma zone base on dtb or ACPI > bootmem_init(); | | > zone_sizes_init() | | > of_dma_get_max_cpu_address | ----| > //Update arm64_dma_phys_limit | ----| > reserve_crashkernel() <-------------- //Because we need arm64_dma_phys_limit to be updated above > request_standard_resources() Yeah, because arm64_dma_phys_limit is decided late in the 1) and 2) case as I summarized, we need defer reserve_crashkernel() to bootmem_init(). But arm64_dma_phys_limit could be 1G or 4G, that's why your optimization about BLOCKING may not be right since you assume the 4G boundary, while forgetting Raspberry Pi4 on which 1G is the boundary of low memory and high memory. So separating out BLOCKING optimization can let us focus on the crashkernel,high support. > > >> > >> ------------------------------------------- > >> arm64_dma_phys_limit > >> 1)Raspberry Pi4 1G > >> 2)Normal machine 4G > >> 3)Special machine MAX > >> 4)No DMA|DMA32 MAX > >> > >> Note: 3)Special machine means the machine's starting physical address is above 4G. > >> WHile 4)No DMA|DMA32 means kernel w/o CONFIG_ZONE_DMA|DMA32, and has > >> IOMMU hardware supporting. > >> =================================================================================== > >> > >> I made a draft patch based on this patchset, please feel free to check and > >> see if it's OK, or anything missing or wrongly understood. I removed > >> reserve_crashkernel_high() and only keep reserve_crashkernel() and > >> reserve_crashkernel_low() as the v21 did. > > > > Sorry, forgot attaching the draft patch. > > > > By the way, we can also have a simple version with basic ,high, ,low > > support, no fallback. We can add fallback and other optimization later. > > This can be plan B. > > Yes, That's what Catalin suggested also. > > Hi, Baoquan He: > Without optimization, the whole Patch 3-4 and 6-7 can be dropped. > > Process after abstraction: > if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) { > reserve_crashkernel() > //block mapping > } else { > //page mapping > reserve_crashkernel() > } > > ------------ Simplified real-world process --------- Yeah, this looks clearer. I would like to see a version with them. > arm64_memblock_init() Before reserve_crashkernel(), we can update arm64_dma_phys_limit as memblock_end_of_DRAM() if CONFIG_ZONE_DMA|DMA32 is not enabled or memblock_start_of_DRAM() is bigger than 4G. Then we go with: if (!arm64_dma_phys_limit) reserve_crashkernel(); Just personal opinion, please check if it's appropriate to handle case 3) which has physical starting memory above 4G here. > if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) > reserve_crashkernel() > paging_init() > map_mem() > if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) > //block mapping > else > //page mapping > unflatten_device_tree or ACPI > bootmem_init(); > zone_sizes_init() > of_dma_get_max_cpu_address > //Update arm64_dma_phys_limit > if (IS_ENABLED(CONFIG_ZONE_DMA) || IS_ENABLED(CONFIG_ZONE_DMA32)) > reserve_crashkernel() The rest sounds good with optimization code split out.
On 2022/4/29 11:24, Baoquan He wrote: > On 04/28/22 at 05:33pm, Leizhen (ThunderTown) wrote: >> >> >> On 2022/4/28 11:52, Baoquan He wrote: >>> On 04/28/22 at 11:40am, Baoquan He wrote: >>>> Hi Catalin, Zhen Lei, >>>> >>>> On 04/27/22 at 05:04pm, Catalin Marinas wrote: >>>>> On Wed, Apr 27, 2022 at 09:49:20PM +0800, Leizhen (ThunderTown) wrote: >>>>>> On 2022/4/27 20:32, Catalin Marinas wrote: >>>>>>> I think one could always pass a default command line like: >>>>>>> >>>>>>> crashkernel=1G,high crashkernel=128M,low >>>>>>> >>>>>>> without much knowledge of the SoC memory layout. >>>>>> >>>>>> Yes, that's what the end result is. The user specify crashkernel=128M,low >>>>>> and the implementation ensure the 128M low memory is allocated from DMA zone. >>>>>> We use arm64_dma_phys_limit as the upper limit for crash low memory. >>>>>> >>>>>> +#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit >>>>>> + unsigned long long crash_max = CRASH_ADDR_LOW_MAX; >>>>>> + crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, >>>>>> crash_base, crash_max); >>>>>> >>>>>>> Another option is to only introduce crashkernel=Y,low and, when that is >>>>>>> passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a >>>>>>> 'high' option at all: >>>>>>> >>>>>>> crashkernel=1G - all within ZONE_DMA >>>>>>> crashkernel=1G crashkernel=128M,low - 128M in ZONE_DMA >>>>>>> 1G above ZONE_DMA >>>>>>> >>>>>>> If ZONE_DMA is not present or it extends to the whole RAM, we can ignore >>>>>>> the 'low' option. >>>>>> >>>>>> I think although the code is hard to make generic, the interface is better to >>>>>> be relatively uniform. A user might have to maintain both x86 and arm64, and >>>>>> so on. It's not a good thing that the difference is too big. >>>>> >>>>> There will be some difference as the 4G limit doesn't always hold for >>>>> arm64 (though it's true in most cases). Anyway, we can probably simplify >>>>> things a bit while following the documented behaviour: >>>>> >>>>> crashkernel=Y - current behaviour within ZONE_DMA >>>>> crashkernel=Y,high - allocate from above ZONE_DMA >>>>> crashkernel=Y,low - allocate within ZONE_DMA >>>>> >>>>> There is no fallback from crashkernel=Y. >>>>> >>>>> The question is whether we still want a default low allocation if >>>>> crashkernel=Y,low is missing but 'high' is present. If we add this, I >>>>> think we'd be consistent with kernel-parameters.txt for the 'low' >>>>> description. A default 'low' is probably not that bad but I'm tempted to >>>>> always mandate both 'high' and 'low'. >>>> >>>> Sorry to interrupt. Seems the ,high ,low and fallback are main concerns >>>> about this version. And I have the same concerns about them which comes >>>> from below points: >>>> 1) we may need to take best effort to keep ,high, ,low behaviour >>>> consistent on all ARCHes. Otherwise user/admin may be confused when they >>>> deploy/configure kdump on different machines of different ARCHes in the >>>> same LAB. I think we should try to avoid the confusion. >> >> Yes, but for someone who is configuring crashkernel= for the first time, he >> needs to read doc to understand how to configure it. The doc can show the >> recommended default value of 'low' size. >> >> After commit 94fb93341822 ("x86/crash: Allocate enough low memory when crashkernel=high"), >> the default 'low' size doesn't make much sense anymore. The default size of swiotlb_size() >> is 64M, far less than 256M. And if user specify "swiotlb=", he can also adjust crashkernel=Y,low. >> >> >> + * -swiotlb size: user-specified with swiotlb= or default. >> - low_size = swiotlb_size_or_default() + (8UL<<20); >> + low_size = max(swiotlb_size_or_default() + (8UL<<20), 256UL<<20); >> >> That means all ARCHs can explicit configure crashkernel=256M,low, instead of >> omitting it. This may be another way to avoid confusion. It's not hard for >> programmer-turned-user/admin. However, this requires us to forgo backward >> compatibility with the default size of 'low'. > > We can make ,high and ,low simpler at first as they are alternative. If > possible, we can also simplify the ,high ,low implementation on x86_64 > if it truly brings better archievement on arm64. OK, I plan to remove optimization, fallback and default low size, to follow the suggestion of Catalin first. But there's one minor point of contention. 1) Both "crashkernel=X,high" and "crashkernel=X,low" must be present. 2) Both "crashkernel=X,high" and "crashkernel=X,low" are present. or Allow "crashkernel=X,high" to be present alone. Unlike x86, the default low size is zero. I prefer 2), how about you? > >> >> >>>> 2) Fallback behaviour is important to our distros. The reason is we will >>>> provide default value with crashkernel=xxxM along kernel of distros. In >>>> this case, we hope the reservation will succeed by all means. The ,high >>>> and ,low is an option if customer likes to take with expertise. >> >> OK, I got it. >> >>>> >>>> After going through arm64 memory init code, I got below summary about >>>> arm64_dma_phys_limit which is the first zone's upper limit. I think we >>>> can make use of it to facilitate to simplify code. >>>> ================================================================================ >>>> DMA DMA32 NORMAL >>>> 1)Raspberry Pi4 0~1G 3G~4G (above 4G) >>>> 2)Normal machine 0~4G 0 (above 4G) >>>> 3)Special machine (above 4G)~MAX >>>> 4)No DMA|DMA32 (above 4G)~MAX >> >> arm64_memblock_init() >> reserve_crashkernel() --------------- 0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into mem_init()") > We don't need different code for this place of reservation as you are > doing in this patchset, since arm64_dma_phys_limit is initialized as > below. In fact, in arm64_memblock_init(), we have made memblock ready, > we can initialize arm64_dma_phys_limit as memblock_end_of_DRAM(). And if > memblock_start_of_DRAM() is bigger than 4G, we possibly can call > reserve_crashkernel() here too. Yes. Maybe all the devices in this environment are 64-bit. One way I know of allowing 32-bit devices to access high memory without SMMU is: Set a fixed value for the upper 32 bits. In this case, the DMA zone should be [phys_start, phys_start + 4G). > > phys_addr_t __ro_after_init arm64_dma_phys_limit = PHYS_MASK + 1; > >> paging_init() | >> map_mem() | >> unflatten_device_tree or ACPI | ---- //Raspberry Pi4 get dma zone base on dtb or ACPI >> bootmem_init(); | | >> zone_sizes_init() | | >> of_dma_get_max_cpu_address | ----| >> //Update arm64_dma_phys_limit | ----| >> reserve_crashkernel() <-------------- //Because we need arm64_dma_phys_limit to be updated above >> request_standard_resources() > > Yeah, because arm64_dma_phys_limit is decided late in the 1) and 2) case > as I summarized, we need defer reserve_crashkernel() to bootmem_init(). But > arm64_dma_phys_limit could be 1G or 4G, that's why your optimization > about BLOCKING may not be right since you assume the 4G boundary, while > forgetting Raspberry Pi4 on which 1G is the boundary of low memory and No, no, my optimization for Raspberry Pi4 is fine. I do page mapping for memory under 4G and block mapping for memory above 4G. But still try to reserve crash low memory from DMA zone. So when 1G is the boundary, it's just not fully optimized, the memory 1-4G are mapped as page. > high memory. So separating out BLOCKING optimization can let us focus on > the crashkernel,high support. > >> >>>> >>>> ------------------------------------------- >>>> arm64_dma_phys_limit >>>> 1)Raspberry Pi4 1G >>>> 2)Normal machine 4G >>>> 3)Special machine MAX >>>> 4)No DMA|DMA32 MAX >>>> >>>> Note: 3)Special machine means the machine's starting physical address is above 4G. >>>> WHile 4)No DMA|DMA32 means kernel w/o CONFIG_ZONE_DMA|DMA32, and has >>>> IOMMU hardware supporting. >>>> =================================================================================== >>>> >>>> I made a draft patch based on this patchset, please feel free to check and >>>> see if it's OK, or anything missing or wrongly understood. I removed >>>> reserve_crashkernel_high() and only keep reserve_crashkernel() and >>>> reserve_crashkernel_low() as the v21 did. >>> >>> Sorry, forgot attaching the draft patch. >>> >>> By the way, we can also have a simple version with basic ,high, ,low >>> support, no fallback. We can add fallback and other optimization later. >>> This can be plan B. >> >> Yes, That's what Catalin suggested also. >> >> Hi, Baoquan He: >> Without optimization, the whole Patch 3-4 and 6-7 can be dropped. >> >> Process after abstraction: >> if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) { >> reserve_crashkernel() >> //block mapping >> } else { >> //page mapping >> reserve_crashkernel() >> } >> >> ------------ Simplified real-world process --------- > Yeah, this looks clearer. I would like to see a version with them. > >> arm64_memblock_init() > Before reserve_crashkernel(), we can update arm64_dma_phys_limit > as memblock_end_of_DRAM() if CONFIG_ZONE_DMA|DMA32 is not enabled or > memblock_start_of_DRAM() is bigger than 4G. > Then we go with: > if (!arm64_dma_phys_limit) > reserve_crashkernel(); > > Just personal opinion, please check if it's appropriate to handle case > 3) which has physical starting memory above 4G here. OK, I'll write it down and consider it in the future optimization. > >> if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) >> reserve_crashkernel() > >> paging_init() >> map_mem() >> if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) >> //block mapping >> else >> //page mapping >> unflatten_device_tree or ACPI >> bootmem_init(); >> zone_sizes_init() >> of_dma_get_max_cpu_address >> //Update arm64_dma_phys_limit >> if (IS_ENABLED(CONFIG_ZONE_DMA) || IS_ENABLED(CONFIG_ZONE_DMA32)) >> reserve_crashkernel() > > The rest sounds good with optimization code split out. > > . >
On 2022/4/29 16:02, Leizhen (ThunderTown) wrote: > > > On 2022/4/29 11:24, Baoquan He wrote: >> On 04/28/22 at 05:33pm, Leizhen (ThunderTown) wrote: >>> >>> >>> On 2022/4/28 11:52, Baoquan He wrote: >>>> On 04/28/22 at 11:40am, Baoquan He wrote: >>>>> Hi Catalin, Zhen Lei, >>>>> >>>>> On 04/27/22 at 05:04pm, Catalin Marinas wrote: >>>>>> On Wed, Apr 27, 2022 at 09:49:20PM +0800, Leizhen (ThunderTown) wrote: >>>>>>> On 2022/4/27 20:32, Catalin Marinas wrote: >>>>>>>> I think one could always pass a default command line like: >>>>>>>> >>>>>>>> crashkernel=1G,high crashkernel=128M,low >>>>>>>> >>>>>>>> without much knowledge of the SoC memory layout. >>>>>>> >>>>>>> Yes, that's what the end result is. The user specify crashkernel=128M,low >>>>>>> and the implementation ensure the 128M low memory is allocated from DMA zone. >>>>>>> We use arm64_dma_phys_limit as the upper limit for crash low memory. >>>>>>> >>>>>>> +#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit >>>>>>> + unsigned long long crash_max = CRASH_ADDR_LOW_MAX; >>>>>>> + crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, >>>>>>> crash_base, crash_max); >>>>>>> >>>>>>>> Another option is to only introduce crashkernel=Y,low and, when that is >>>>>>>> passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a >>>>>>>> 'high' option at all: >>>>>>>> >>>>>>>> crashkernel=1G - all within ZONE_DMA >>>>>>>> crashkernel=1G crashkernel=128M,low - 128M in ZONE_DMA >>>>>>>> 1G above ZONE_DMA >>>>>>>> >>>>>>>> If ZONE_DMA is not present or it extends to the whole RAM, we can ignore >>>>>>>> the 'low' option. >>>>>>> >>>>>>> I think although the code is hard to make generic, the interface is better to >>>>>>> be relatively uniform. A user might have to maintain both x86 and arm64, and >>>>>>> so on. It's not a good thing that the difference is too big. >>>>>> >>>>>> There will be some difference as the 4G limit doesn't always hold for >>>>>> arm64 (though it's true in most cases). Anyway, we can probably simplify >>>>>> things a bit while following the documented behaviour: >>>>>> >>>>>> crashkernel=Y - current behaviour within ZONE_DMA >>>>>> crashkernel=Y,high - allocate from above ZONE_DMA >>>>>> crashkernel=Y,low - allocate within ZONE_DMA >>>>>> >>>>>> There is no fallback from crashkernel=Y. >>>>>> >>>>>> The question is whether we still want a default low allocation if >>>>>> crashkernel=Y,low is missing but 'high' is present. If we add this, I >>>>>> think we'd be consistent with kernel-parameters.txt for the 'low' >>>>>> description. A default 'low' is probably not that bad but I'm tempted to >>>>>> always mandate both 'high' and 'low'. >>>>> >>>>> Sorry to interrupt. Seems the ,high ,low and fallback are main concerns >>>>> about this version. And I have the same concerns about them which comes >>>>> from below points: >>>>> 1) we may need to take best effort to keep ,high, ,low behaviour >>>>> consistent on all ARCHes. Otherwise user/admin may be confused when they >>>>> deploy/configure kdump on different machines of different ARCHes in the >>>>> same LAB. I think we should try to avoid the confusion. >>> >>> Yes, but for someone who is configuring crashkernel= for the first time, he >>> needs to read doc to understand how to configure it. The doc can show the >>> recommended default value of 'low' size. >>> >>> After commit 94fb93341822 ("x86/crash: Allocate enough low memory when crashkernel=high"), >>> the default 'low' size doesn't make much sense anymore. The default size of swiotlb_size() >>> is 64M, far less than 256M. And if user specify "swiotlb=", he can also adjust crashkernel=Y,low. >>> >>> >>> + * -swiotlb size: user-specified with swiotlb= or default. >>> - low_size = swiotlb_size_or_default() + (8UL<<20); >>> + low_size = max(swiotlb_size_or_default() + (8UL<<20), 256UL<<20); >>> >>> That means all ARCHs can explicit configure crashkernel=256M,low, instead of >>> omitting it. This may be another way to avoid confusion. It's not hard for >>> programmer-turned-user/admin. However, this requires us to forgo backward >>> compatibility with the default size of 'low'. >> >> We can make ,high and ,low simpler at first as they are alternative. If >> possible, we can also simplify the ,high ,low implementation on x86_64 >> if it truly brings better archievement on arm64. > > OK, I plan to remove optimization, fallback and default low size, to follow the > suggestion of Catalin first. But there's one minor point of contention. > > 1) Both "crashkernel=X,high" and "crashkernel=X,low" must be present. > 2) Both "crashkernel=X,high" and "crashkernel=X,low" are present. > or > Allow "crashkernel=X,high" to be present alone. Unlike x86, the default low size is zero. > > I prefer 2), how about you? > >> >>> >>> >>>>> 2) Fallback behaviour is important to our distros. The reason is we will >>>>> provide default value with crashkernel=xxxM along kernel of distros. In >>>>> this case, we hope the reservation will succeed by all means. The ,high >>>>> and ,low is an option if customer likes to take with expertise. >>> >>> OK, I got it. >>> >>>>> >>>>> After going through arm64 memory init code, I got below summary about >>>>> arm64_dma_phys_limit which is the first zone's upper limit. I think we >>>>> can make use of it to facilitate to simplify code. >>>>> ================================================================================ >>>>> DMA DMA32 NORMAL >>>>> 1)Raspberry Pi4 0~1G 3G~4G (above 4G) >>>>> 2)Normal machine 0~4G 0 (above 4G) >>>>> 3)Special machine (above 4G)~MAX >>>>> 4)No DMA|DMA32 (above 4G)~MAX >>> >>> arm64_memblock_init() >>> reserve_crashkernel() --------------- 0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into mem_init()") >> We don't need different code for this place of reservation as you are >> doing in this patchset, since arm64_dma_phys_limit is initialized as >> below. In fact, in arm64_memblock_init(), we have made memblock ready, >> we can initialize arm64_dma_phys_limit as memblock_end_of_DRAM(). And if >> memblock_start_of_DRAM() is bigger than 4G, we possibly can call >> reserve_crashkernel() here too. > > Yes. Maybe all the devices in this environment are 64-bit. One way I know of allowing > 32-bit devices to access high memory without SMMU is: Set a fixed value for the upper > 32 bits. In this case, the DMA zone should be [phys_start, phys_start + 4G). I just read the message of commit 791ab8b2e3 ("arm64: Ignore any DMA offsets in the max_zone_phys() calculation") Currently, the kernel assumes that if RAM starts above 32-bit (or zone_bits), there is still a ZONE_DMA/DMA32 at the bottom of the RAM and such constrained devices have a hardwired DMA offset. In practice, we haven't noticed any such hardware so let's assume that we can expand ZONE_DMA32 to the available memory if no RAM below 4GB. Similarly, ZONE_DMA is expanded to the 4GB limit if no RAM addressable by zone_bits. So your suggestion is feasible. > >> >> phys_addr_t __ro_after_init arm64_dma_phys_limit = PHYS_MASK + 1; >> >>> paging_init() | >>> map_mem() | >>> unflatten_device_tree or ACPI | ---- //Raspberry Pi4 get dma zone base on dtb or ACPI >>> bootmem_init(); | | >>> zone_sizes_init() | | >>> of_dma_get_max_cpu_address | ----| >>> //Update arm64_dma_phys_limit | ----| >>> reserve_crashkernel() <-------------- //Because we need arm64_dma_phys_limit to be updated above >>> request_standard_resources() >> >> Yeah, because arm64_dma_phys_limit is decided late in the 1) and 2) case >> as I summarized, we need defer reserve_crashkernel() to bootmem_init(). But >> arm64_dma_phys_limit could be 1G or 4G, that's why your optimization >> about BLOCKING may not be right since you assume the 4G boundary, while >> forgetting Raspberry Pi4 on which 1G is the boundary of low memory and > > No, no, my optimization for Raspberry Pi4 is fine. I do page mapping for memory > under 4G and block mapping for memory above 4G. But still try to reserve crash > low memory from DMA zone. So when 1G is the boundary, it's just not fully optimized, > the memory 1-4G are mapped as page. > >> high memory. So separating out BLOCKING optimization can let us focus on >> the crashkernel,high support. >> >>> >>>>> >>>>> ------------------------------------------- >>>>> arm64_dma_phys_limit >>>>> 1)Raspberry Pi4 1G >>>>> 2)Normal machine 4G >>>>> 3)Special machine MAX >>>>> 4)No DMA|DMA32 MAX >>>>> >>>>> Note: 3)Special machine means the machine's starting physical address is above 4G. >>>>> WHile 4)No DMA|DMA32 means kernel w/o CONFIG_ZONE_DMA|DMA32, and has >>>>> IOMMU hardware supporting. >>>>> =================================================================================== >>>>> >>>>> I made a draft patch based on this patchset, please feel free to check and >>>>> see if it's OK, or anything missing or wrongly understood. I removed >>>>> reserve_crashkernel_high() and only keep reserve_crashkernel() and >>>>> reserve_crashkernel_low() as the v21 did. >>>> >>>> Sorry, forgot attaching the draft patch. >>>> >>>> By the way, we can also have a simple version with basic ,high, ,low >>>> support, no fallback. We can add fallback and other optimization later. >>>> This can be plan B. >>> >>> Yes, That's what Catalin suggested also. >>> >>> Hi, Baoquan He: >>> Without optimization, the whole Patch 3-4 and 6-7 can be dropped. >>> >>> Process after abstraction: >>> if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) { >>> reserve_crashkernel() >>> //block mapping >>> } else { >>> //page mapping >>> reserve_crashkernel() >>> } >>> >>> ------------ Simplified real-world process --------- >> Yeah, this looks clearer. I would like to see a version with them. >> >>> arm64_memblock_init() >> Before reserve_crashkernel(), we can update arm64_dma_phys_limit >> as memblock_end_of_DRAM() if CONFIG_ZONE_DMA|DMA32 is not enabled or >> memblock_start_of_DRAM() is bigger than 4G. >> Then we go with: >> if (!arm64_dma_phys_limit) >> reserve_crashkernel(); >> >> Just personal opinion, please check if it's appropriate to handle case >> 3) which has physical starting memory above 4G here. > > OK, I'll write it down and consider it in the future optimization. > >> >>> if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) >>> reserve_crashkernel() >> >>> paging_init() >>> map_mem() >>> if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) >>> //block mapping >>> else >>> //page mapping >>> unflatten_device_tree or ACPI >>> bootmem_init(); >>> zone_sizes_init() >>> of_dma_get_max_cpu_address >>> //Update arm64_dma_phys_limit >>> if (IS_ENABLED(CONFIG_ZONE_DMA) || IS_ENABLED(CONFIG_ZONE_DMA32)) >>> reserve_crashkernel() >> >> The rest sounds good with optimization code split out. >> >> . >> >
On Fri, Apr 29, 2022 at 04:25:37PM +0800, Leizhen (ThunderTown) wrote: > On 2022/4/29 16:02, Leizhen (ThunderTown) wrote: > > On 2022/4/29 11:24, Baoquan He wrote: > >> On 04/28/22 at 05:33pm, Leizhen (ThunderTown) wrote: > >>> On 2022/4/28 11:52, Baoquan He wrote: > >>>> On 04/28/22 at 11:40am, Baoquan He wrote: > >>>>> On 04/27/22 at 05:04pm, Catalin Marinas wrote: > >>>>>> There will be some difference as the 4G limit doesn't always hold for > >>>>>> arm64 (though it's true in most cases). Anyway, we can probably simplify > >>>>>> things a bit while following the documented behaviour: > >>>>>> > >>>>>> crashkernel=Y - current behaviour within ZONE_DMA > >>>>>> crashkernel=Y,high - allocate from above ZONE_DMA > >>>>>> crashkernel=Y,low - allocate within ZONE_DMA [...] > >>>>> Sorry to interrupt. Seems the ,high ,low and fallback are main concerns > >>>>> about this version. And I have the same concerns about them which comes > >>>>> from below points: > >>>>> 1) we may need to take best effort to keep ,high, ,low behaviour > >>>>> consistent on all ARCHes. Otherwise user/admin may be confused when they > >>>>> deploy/configure kdump on different machines of different ARCHes in the > >>>>> same LAB. I think we should try to avoid the confusion. I guess by all arches you mean just x86 here. Since the code is not generic, all arches do their own stuff. > > OK, I plan to remove optimization, fallback and default low size, to follow the > > suggestion of Catalin first. But there's one minor point of contention. > > > > 1) Both "crashkernel=X,high" and "crashkernel=X,low" must be present. > > 2) Both "crashkernel=X,high" and "crashkernel=X,low" are present. > > or > > Allow "crashkernel=X,high" to be present alone. Unlike x86, the default low size is zero. > > > > I prefer 2), how about you? (2) works for me as well. We keep these simple as "expert" options and allow crashkernel= to fall back to 'high' if not sufficient memory in ZONE_DMA. That would be a slight change from the current behaviour but, as Zhen Lei said, with the old tools it's just moving the error around, the crashkernel wouldn't be available in either case. > >>>>> 2) Fallback behaviour is important to our distros. The reason is we will > >>>>> provide default value with crashkernel=xxxM along kernel of distros. In > >>>>> this case, we hope the reservation will succeed by all means. The ,high > >>>>> and ,low is an option if customer likes to take with expertise. OK, that's good feedback. So, to recap, IIUC you are fine with: crashkernel=Y - allocate within ZONE_DMA with fallback above with a default in ZONE_DMA (like x86, 256M or swiotlb size) crashkernel=Y,high - allocate from above ZONE_DMA crashkernel=Y,low - allocate within ZONE_DMA 'crashkernel' overrides the high and low while the latter two can be passed independently. > >>>>> After going through arm64 memory init code, I got below summary about > >>>>> arm64_dma_phys_limit which is the first zone's upper limit. I think we > >>>>> can make use of it to facilitate to simplify code. > >>>>> ================================================================================ > >>>>> DMA DMA32 NORMAL > >>>>> 1)Raspberry Pi4 0~1G 3G~4G (above 4G) > >>>>> 2)Normal machine 0~4G 0 (above 4G) > >>>>> 3)Special machine (above 4G)~MAX > >>>>> 4)No DMA|DMA32 (above 4G)~MAX > >>> > >>> arm64_memblock_init() > >>> reserve_crashkernel() --------------- 0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into mem_init()") > >> We don't need different code for this place of reservation as you are > >> doing in this patchset, since arm64_dma_phys_limit is initialized as > >> below. In fact, in arm64_memblock_init(), we have made memblock ready, > >> we can initialize arm64_dma_phys_limit as memblock_end_of_DRAM(). And if > >> memblock_start_of_DRAM() is bigger than 4G, we possibly can call > >> reserve_crashkernel() here too. > > > > Yes. Maybe all the devices in this environment are 64-bit. One way I > > know of allowing 32-bit devices to access high memory without SMMU > > is: Set a fixed value for the upper 32 bits. In this case, the DMA > > zone should be [phys_start, phys_start + 4G). We decided that this case doesn't really exists for arm64 platforms (no need for special ZONE_DMA). > I just read the message of commit 791ab8b2e3 ("arm64: Ignore any DMA > offsets in the max_zone_phys() calculation") > > Currently, the kernel assumes that if RAM starts above 32-bit (or > zone_bits), there is still a ZONE_DMA/DMA32 at the bottom of the RAM and > such constrained devices have a hardwired DMA offset. In practice, we > haven't noticed any such hardware so let's assume that we can expand > ZONE_DMA32 to the available memory if no RAM below 4GB. Similarly, > ZONE_DMA is expanded to the 4GB limit if no RAM addressable by > zone_bits. I think the above log is slightly confusing. If the DRAM starts above 4G, ZONE_DMA goes to the end of DRAM. If the DRAM starts below 4G but above the zone_bits for ZONE_DMA as specified in DT/ACPI, it pushes ZONE_DMA to 4G. I don't remember why we did this last part, maybe in case we get incorrect firmware tables, otherwise we could have extended ZONE_DMA to end of DRAM. Zhen Lei, if we agreed on the crashkernel behaviour, could you please post a series that does the above parsing allocation? Ignore the optimisations, we can look at them afterwards. Thanks.
On 2022/5/4 6:00, Catalin Marinas wrote: > On Fri, Apr 29, 2022 at 04:25:37PM +0800, Leizhen (ThunderTown) wrote: >> On 2022/4/29 16:02, Leizhen (ThunderTown) wrote: >>> On 2022/4/29 11:24, Baoquan He wrote: >>>> On 04/28/22 at 05:33pm, Leizhen (ThunderTown) wrote: >>>>> On 2022/4/28 11:52, Baoquan He wrote: >>>>>> On 04/28/22 at 11:40am, Baoquan He wrote: >>>>>>> On 04/27/22 at 05:04pm, Catalin Marinas wrote: >>>>>>>> There will be some difference as the 4G limit doesn't always hold for >>>>>>>> arm64 (though it's true in most cases). Anyway, we can probably simplify >>>>>>>> things a bit while following the documented behaviour: >>>>>>>> >>>>>>>> crashkernel=Y - current behaviour within ZONE_DMA >>>>>>>> crashkernel=Y,high - allocate from above ZONE_DMA >>>>>>>> crashkernel=Y,low - allocate within ZONE_DMA > [...] >>>>>>> Sorry to interrupt. Seems the ,high ,low and fallback are main concerns >>>>>>> about this version. And I have the same concerns about them which comes >>>>>>> from below points: >>>>>>> 1) we may need to take best effort to keep ,high, ,low behaviour >>>>>>> consistent on all ARCHes. Otherwise user/admin may be confused when they >>>>>>> deploy/configure kdump on different machines of different ARCHes in the >>>>>>> same LAB. I think we should try to avoid the confusion. > > I guess by all arches you mean just x86 here. Since the code is not > generic, all arches do their own stuff. > >>> OK, I plan to remove optimization, fallback and default low size, to follow the >>> suggestion of Catalin first. But there's one minor point of contention. >>> >>> 1) Both "crashkernel=X,high" and "crashkernel=X,low" must be present. >>> 2) Both "crashkernel=X,high" and "crashkernel=X,low" are present. >>> or >>> Allow "crashkernel=X,high" to be present alone. Unlike x86, the default low size is zero. >>> >>> I prefer 2), how about you? > > (2) works for me as well. We keep these simple as "expert" options and Okay, so I'll follow 2) to update the code. > allow crashkernel= to fall back to 'high' if not sufficient memory in > ZONE_DMA. That would be a slight change from the current behaviour but, > as Zhen Lei said, with the old tools it's just moving the error around, > the crashkernel wouldn't be available in either case. > >>>>>>> 2) Fallback behaviour is important to our distros. The reason is we will >>>>>>> provide default value with crashkernel=xxxM along kernel of distros. In >>>>>>> this case, we hope the reservation will succeed by all means. The ,high >>>>>>> and ,low is an option if customer likes to take with expertise. > > OK, that's good feedback. > > So, to recap, IIUC you are fine with: > > crashkernel=Y - allocate within ZONE_DMA with fallback > above with a default in ZONE_DMA (like > x86, 256M or swiotlb size) > crashkernel=Y,high - allocate from above ZONE_DMA > crashkernel=Y,low - allocate within ZONE_DMA > > 'crashkernel' overrides the high and low while the latter two can be > passed independently. > >>>>>>> After going through arm64 memory init code, I got below summary about >>>>>>> arm64_dma_phys_limit which is the first zone's upper limit. I think we >>>>>>> can make use of it to facilitate to simplify code. >>>>>>> ================================================================================ >>>>>>> DMA DMA32 NORMAL >>>>>>> 1)Raspberry Pi4 0~1G 3G~4G (above 4G) >>>>>>> 2)Normal machine 0~4G 0 (above 4G) >>>>>>> 3)Special machine (above 4G)~MAX >>>>>>> 4)No DMA|DMA32 (above 4G)~MAX >>>>> >>>>> arm64_memblock_init() >>>>> reserve_crashkernel() --------------- 0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into mem_init()") >>>> We don't need different code for this place of reservation as you are >>>> doing in this patchset, since arm64_dma_phys_limit is initialized as >>>> below. In fact, in arm64_memblock_init(), we have made memblock ready, >>>> we can initialize arm64_dma_phys_limit as memblock_end_of_DRAM(). And if >>>> memblock_start_of_DRAM() is bigger than 4G, we possibly can call >>>> reserve_crashkernel() here too. >>> >>> Yes. Maybe all the devices in this environment are 64-bit. One way I >>> know of allowing 32-bit devices to access high memory without SMMU >>> is: Set a fixed value for the upper 32 bits. In this case, the DMA >>> zone should be [phys_start, phys_start + 4G). > > We decided that this case doesn't really exists for arm64 platforms (no > need for special ZONE_DMA). > >> I just read the message of commit 791ab8b2e3 ("arm64: Ignore any DMA >> offsets in the max_zone_phys() calculation") >> >> Currently, the kernel assumes that if RAM starts above 32-bit (or >> zone_bits), there is still a ZONE_DMA/DMA32 at the bottom of the RAM and >> such constrained devices have a hardwired DMA offset. In practice, we >> haven't noticed any such hardware so let's assume that we can expand >> ZONE_DMA32 to the available memory if no RAM below 4GB. Similarly, >> ZONE_DMA is expanded to the 4GB limit if no RAM addressable by >> zone_bits. > > I think the above log is slightly confusing. If the DRAM starts above > 4G, ZONE_DMA goes to the end of DRAM. If the DRAM starts below 4G but > above the zone_bits for ZONE_DMA as specified in DT/ACPI, it pushes > ZONE_DMA to 4G. I don't remember why we did this last part, maybe in > case we get incorrect firmware tables, otherwise we could have extended > ZONE_DMA to end of DRAM. > > Zhen Lei, if we agreed on the crashkernel behaviour, could you please > post a series that does the above parsing allocation? Ignore the > optimisations, we can look at them afterwards. OK, I've changed the code before the festival, and I'll test it today. > > Thanks. >
On 05/03/22 at 11:00pm, Catalin Marinas wrote: > On Fri, Apr 29, 2022 at 04:25:37PM +0800, Leizhen (ThunderTown) wrote: > > On 2022/4/29 16:02, Leizhen (ThunderTown) wrote: > > > On 2022/4/29 11:24, Baoquan He wrote: > > >> On 04/28/22 at 05:33pm, Leizhen (ThunderTown) wrote: > > >>> On 2022/4/28 11:52, Baoquan He wrote: > > >>>> On 04/28/22 at 11:40am, Baoquan He wrote: > > >>>>> On 04/27/22 at 05:04pm, Catalin Marinas wrote: > > >>>>>> There will be some difference as the 4G limit doesn't always hold for > > >>>>>> arm64 (though it's true in most cases). Anyway, we can probably simplify > > >>>>>> things a bit while following the documented behaviour: > > >>>>>> > > >>>>>> crashkernel=Y - current behaviour within ZONE_DMA > > >>>>>> crashkernel=Y,high - allocate from above ZONE_DMA > > >>>>>> crashkernel=Y,low - allocate within ZONE_DMA > [...] > > >>>>> Sorry to interrupt. Seems the ,high ,low and fallback are main concerns > > >>>>> about this version. And I have the same concerns about them which comes > > >>>>> from below points: > > >>>>> 1) we may need to take best effort to keep ,high, ,low behaviour > > >>>>> consistent on all ARCHes. Otherwise user/admin may be confused when they > > >>>>> deploy/configure kdump on different machines of different ARCHes in the > > >>>>> same LAB. I think we should try to avoid the confusion. > > I guess by all arches you mean just x86 here. Since the code is not > generic, all arches do their own stuff. Right. Since currently only x86 has crashkernel,high|low support. From the distros and customer's point of view, we would like to see the same feature has the same or similar behaviour. This will ease operation and maintaining. E.g on the cloud platform, the base of it could be any ARCH, x86, arm64. The inconsistent behaviour could cause confusion. Certainly, the underlying implementation may be different. Surely, if arm64 has its own manner because of reasons, we can add document to note that. > > > > OK, I plan to remove optimization, fallback and default low size, to follow the > > > suggestion of Catalin first. But there's one minor point of contention. > > > > > > 1) Both "crashkernel=X,high" and "crashkernel=X,low" must be present. > > > 2) Both "crashkernel=X,high" and "crashkernel=X,low" are present. > > > or > > > Allow "crashkernel=X,high" to be present alone. Unlike x86, the default low size is zero. > > > > > > I prefer 2), how about you? > > (2) works for me as well. We keep these simple as "expert" options and > allow crashkernel= to fall back to 'high' if not sufficient memory in > ZONE_DMA. That would be a slight change from the current behaviour but, > as Zhen Lei said, with the old tools it's just moving the error around, > the crashkernel wouldn't be available in either case. > > > >>>>> 2) Fallback behaviour is important to our distros. The reason is we will > > >>>>> provide default value with crashkernel=xxxM along kernel of distros. In > > >>>>> this case, we hope the reservation will succeed by all means. The ,high > > >>>>> and ,low is an option if customer likes to take with expertise. > > OK, that's good feedback. > > So, to recap, IIUC you are fine with: > > crashkernel=Y - allocate within ZONE_DMA with fallback > above with a default in ZONE_DMA (like > x86, 256M or swiotlb size) Ack to this one. > crashkernel=Y,high - allocate from above ZONE_DMA Not exactly. If there's only ZONE_DMA, crashkernel,high will be reserved in ZONE_DMA, and crashkernel,low will be ignored. Other than this, ack. > crashkernel=Y,low - allocate within ZONE_DMA Ack to this one. > > 'crashkernel' overrides the high and low while the latter two can be > passed independently. crashkernel=,high can be passed independently, then a crashkernel=,low is needed implicitly. If people don't want crashkernel=,low explicitly, crashkernel=0,low need be specified. An independent crashkernel=,low makes no sense. Crashkernel=,low should be paird with crashkernel=,high. My personal opinion according to the existed senmantics on x86. Otherwise, the guidance of crashkernel= |,high|,low reservation will be complicated to write. > > > >>>>> After going through arm64 memory init code, I got below summary about > > >>>>> arm64_dma_phys_limit which is the first zone's upper limit. I think we > > >>>>> can make use of it to facilitate to simplify code. > > >>>>> ================================================================================ > > >>>>> DMA DMA32 NORMAL > > >>>>> 1)Raspberry Pi4 0~1G 3G~4G (above 4G) > > >>>>> 2)Normal machine 0~4G 0 (above 4G) > > >>>>> 3)Special machine (above 4G)~MAX > > >>>>> 4)No DMA|DMA32 (above 4G)~MAX > > >>> > > >>> arm64_memblock_init() > > >>> reserve_crashkernel() --------------- 0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into mem_init()") > > >> We don't need different code for this place of reservation as you are > > >> doing in this patchset, since arm64_dma_phys_limit is initialized as > > >> below. In fact, in arm64_memblock_init(), we have made memblock ready, > > >> we can initialize arm64_dma_phys_limit as memblock_end_of_DRAM(). And if > > >> memblock_start_of_DRAM() is bigger than 4G, we possibly can call > > >> reserve_crashkernel() here too. > > > > > > Yes. Maybe all the devices in this environment are 64-bit. One way I > > > know of allowing 32-bit devices to access high memory without SMMU > > > is: Set a fixed value for the upper 32 bits. In this case, the DMA > > > zone should be [phys_start, phys_start + 4G). > > We decided that this case doesn't really exists for arm64 platforms (no > need for special ZONE_DMA). > > > I just read the message of commit 791ab8b2e3 ("arm64: Ignore any DMA > > offsets in the max_zone_phys() calculation") > > > > Currently, the kernel assumes that if RAM starts above 32-bit (or > > zone_bits), there is still a ZONE_DMA/DMA32 at the bottom of the RAM and > > such constrained devices have a hardwired DMA offset. In practice, we > > haven't noticed any such hardware so let's assume that we can expand > > ZONE_DMA32 to the available memory if no RAM below 4GB. Similarly, > > ZONE_DMA is expanded to the 4GB limit if no RAM addressable by > > zone_bits. > > I think the above log is slightly confusing. If the DRAM starts above > 4G, ZONE_DMA goes to the end of DRAM. If the DRAM starts below 4G but > above the zone_bits for ZONE_DMA as specified in DT/ACPI, it pushes > ZONE_DMA to 4G. I don't remember why we did this last part, maybe in > case we get incorrect firmware tables, otherwise we could have extended > ZONE_DMA to end of DRAM. > > Zhen Lei, if we agreed on the crashkernel behaviour, could you please > post a series that does the above parsing allocation? Ignore the > optimisations, we can look at them afterwards. > > Thanks. > > -- > Catalin >
On Thu, May 05, 2022 at 11:00:19AM +0800, Baoquan He wrote: > On 05/03/22 at 11:00pm, Catalin Marinas wrote: > > So, to recap, IIUC you are fine with: > > > > crashkernel=Y - allocate within ZONE_DMA with fallback > > above with a default in ZONE_DMA (like > > x86, 256M or swiotlb size) > > Ack to this one. > > > > crashkernel=Y,high - allocate from above ZONE_DMA > > Not exactly. If there's only ZONE_DMA, crashkernel,high will > be reserved in ZONE_DMA, and crashkernel,low will be ignored. > Other than this, ack. Yes, that's fine. > > crashkernel=Y,low - allocate within ZONE_DMA > > Ack to this one. > > > > 'crashkernel' overrides the high and low while the latter two can be > > passed independently. > > crashkernel=,high can be passed independently, then a crashkernel=,low > is needed implicitly. If people don't want crashkernel=,low > explicitly, crashkernel=0,low need be specified. I find this complicating the interface. I don't know the background to the x86 implementation but we diverge already on arm64 since we talk about ZONE_DMA rather than 4G limit (though for most platforms these would be the same). I guess we could restate the difference between crashkernel= and crashkernel=,high as the hint to go for allocation above ZONE_DMA first. > An independent crashkernel=,low makes no sense. Crashkernel=,low > should be paird with crashkernel=,high. You could argue that crashkernel=,low gives the current crashkernel= behaviour, i.e. either all within ZONE_DMA or fail to allocate. So it may have some value on its own. > My personal opinion according to the existed senmantics on x86. > Otherwise, the guidance of crashkernel= |,high|,low reservation > will be complicated to write. It's more that I find the current semantics unnecessarily confusing. But even reading the x86_64 text it's not that clear. For example the default low allocation for crashkernel= and crashkernel=,high is only mentioned in the crashkernel=,low description.
On 05/05/22 at 03:20pm, Catalin Marinas wrote: > On Thu, May 05, 2022 at 11:00:19AM +0800, Baoquan He wrote: > > On 05/03/22 at 11:00pm, Catalin Marinas wrote: > > > So, to recap, IIUC you are fine with: > > > > > > crashkernel=Y - allocate within ZONE_DMA with fallback > > > above with a default in ZONE_DMA (like > > > x86, 256M or swiotlb size) > > > > Ack to this one. > > > > > > > crashkernel=Y,high - allocate from above ZONE_DMA > > > > Not exactly. If there's only ZONE_DMA, crashkernel,high will > > be reserved in ZONE_DMA, and crashkernel,low will be ignored. > > Other than this, ack. > > Yes, that's fine. > > > > crashkernel=Y,low - allocate within ZONE_DMA > > > > Ack to this one. > > > > > > 'crashkernel' overrides the high and low while the latter two can be > > > passed independently. > > > > crashkernel=,high can be passed independently, then a crashkernel=,low > > is needed implicitly. If people don't want crashkernel=,low > > explicitly, crashkernel=0,low need be specified. > > I find this complicating the interface. I don't know the background to > the x86 implementation but we diverge already on arm64 since we talk > about ZONE_DMA rather than 4G limit (though for most platforms these > would be the same). > > I guess we could restate the difference between crashkernel= and > crashkernel=,high as the hint to go for allocation above ZONE_DMA first. Yes, rethinking about this, we can make a straightforward and simpler crashkernel=,high|,low on arm64, namely asking for user to clearly specify them. During maintenance of crashkernel= parameter in our distros, we found crashkernel=xM is used mostly since most of systems can be satisfied with 256M or a little more for kdump. While on some big end servers, 1G or more crashkernel memory is needed. In this case, crashkernel=,high is taken. We don't want to reserve so much low memory during system running while just waiting in case rare crash happened. crashkernel=,high is rarely used, so making it simple and not so flexible is not so bad. We can improve it later with justification. > > > An independent crashkernel=,low makes no sense. Crashkernel=,low > > should be paird with crashkernel=,high. > > You could argue that crashkernel=,low gives the current crashkernel= > behaviour, i.e. either all within ZONE_DMA or fail to allocate. So it > may have some value on its own. Yes, crashkernel=,low has the same behaviour as the current crashkernel= if we decide not to add fallback mechanism to it. The purpose of crahskernel=,low is to assist crashkernel=,high to get kdump kernel boot up with satisfing DMA allocation. While allowing independent crashkernel=,low will add it another mission, limiting crashkernel only reserved in low memory. Up to now, we don't see the need for that. > > > My personal opinion according to the existed senmantics on x86. > > Otherwise, the guidance of crashkernel= |,high|,low reservation > > will be complicated to write. > > It's more that I find the current semantics unnecessarily confusing. But > even reading the x86_64 text it's not that clear. For example the > default low allocation for crashkernel= and crashkernel=,high is only > mentioned in the crashkernel=,low description. Yeah, we can improve those document if insufficiency is found. By the way, with my observation, crashkernel= with fallback meet 99% of our needs. If people really need more than 512M memory or more, then please consider crashkernel=,high. Basically on servers, low memory is limited, while high memory is very big. So I agree with you that we can make it step by step, firstly adding basic crashkernel=,high and ,low support. We can add those complicated cases later.