mbox series

[v18,00/17] support reserving crashkernel above 4G on arm64 kdump

Message ID 20211222130820.1754-1-thunder.leizhen@huawei.com
Headers show
Series support reserving crashkernel above 4G on arm64 kdump | expand

Message

Leizhen (ThunderTown) Dec. 22, 2021, 1:08 p.m. UTC
There are following issues in arm64 kdump:
1. We use crashkernel=X to reserve crashkernel below 4G, which
will fail when there is no enough low memory.
2. If reserving crashkernel above 4G, in this case, crash dump
kernel will boot failure because there is no low memory available
for allocation.

To solve these issues, change the behavior of crashkernel=X.
crashkernel=X tries low allocation in DMA zone and fall back to high
allocation if it fails.

We can also use "crashkernel=X,high" to select a high region above
DMA zone, which also tries to allocate at least 256M low memory in
DMA zone automatically and "crashkernel=Y,low" can be used to allocate
specified size low memory.

When reserving crashkernel in high memory, some low memory is reserved
for crash dump kernel devices. So there may be two regions reserved for
crash dump kernel.
In order to distinct from the high region and make no effect to the use
of existing kexec-tools, rename the low region as "Crash kernel (low)",
and pass the low region by reusing DT property
"linux,usable-memory-range". We made the low memory region as the last
range of "linux,usable-memory-range" to keep compatibility with existing
user-space and older kdump kernels.

Besides, we need to modify kexec-tools:
arm64: support more than one crash kernel regions(see [1])

Another update is document about DT property 'linux,usable-memory-range':
schemas: update 'linux,usable-memory-range' node schema(see [2])

This patchset contains the following 17 patches:

0001-0003 are some x86 cleanups which prepares for making functionsreserve_crashkernel[_low]() generic.
0004-0006 Add helper parse_crashkernel_in_order() and refact reserve_crashkernel{_low}
0007 cleanup
0008-0009 makes functions reserve_crashkernel[_low]() generic.
0010-0012 do some cleanups for parse_crashkernel_{high|low} and __parse_crashkernel()
0013-0014 reimplements arm64 crashkernel=X.
0015-0016 adds memory for devices by DT property linux,usable-memory-range.
0017 updates the doc.

Changes since [v17]
The patches 0013-0016 have no change, 0017 see below, all other 0001-0012 patches
are rewritten or new. The main change is patches 0004-0005, which refact
reserve_crashkernel{_low}. Patch 0009 reduced some unnecessary changes compared
with 0005 in v17, two other differences are broken down into 0001 and 0008.

New Changes in 0017:
-			It will be ignored if crashkernel=X is specified.
+			It will be ignored if crashkernel=X is correctly specified.

Changes since [v16]
- Because no functional changes in this version, so add
  "Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>" for patch 1-9
- Add "Reviewed-by: Rob Herring <robh@kernel.org>" for patch 8
- Update patch 9 based on the review comments of Rob Herring
- As Catalin Marinas's suggestion, merge the implementation of
  ARCH_WANT_RESERVE_CRASH_KERNEL into patch 5. Ensure that the
  contents of X86 and ARM64 do not overlap, and reduce unnecessary
  temporary differences.

Changes since [v15]
-  Aggregate the processing of "linux,usable-memory-range" into one function.
   Only patch 9-10 have been updated.

Changes since [v14]
- Recovering the requirement that the CrashKernel memory regions on X86
  only requires 1 MiB alignment.
- Combine patches 5 and 6 in v14 into one. The compilation warning fixed
  by patch 6 was introduced by patch 5 in v14.
- As with crashk_res, crashk_low_res is also processed by
  crash_exclude_mem_range() in patch 7.
- Due to commit b261dba2fdb2 ("arm64: kdump: Remove custom linux,usable-memory-range handling")
  has removed the architecture-specific code, extend the property "linux,usable-memory-range"
  in the platform-agnostic FDT core code. See patch 9.
- Discard the x86 description update in the document, because the description
  has been updated by commit b1f4c363666c ("Documentation: kdump: update kdump guide").
- Change "arm64" to "ARM64" in Doc.


Changes since [v13]
- Rebased on top of 5.11-rc5.
- Introduce config CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL.
Since reserve_crashkernel[_low]() implementations are quite similar on
other architectures, so have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in
arch/Kconfig and select this by X86 and ARM64.
- Some minor cleanup.

Changes since [v12]
- Rebased on top of 5.10-rc1.
- Keep CRASH_ALIGN as 16M suggested by Dave.
- Drop patch "kdump: add threshold for the required memory".
- Add Tested-by from John.

Changes since [v11]
- Rebased on top of 5.9-rc4.
- Make the function reserve_crashkernel() of x86 generic.
Suggested by Catalin, make the function reserve_crashkernel() of x86 generic
and arm64 use the generic version to reimplement crashkernel=X.

Changes since [v10]
- Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.

Changes since [v9]
- Patch 1 add Acked-by from Dave.
- Update patch 5 according to Dave's comments.
- Update chosen schema.

Changes since [v8]
- Reuse DT property "linux,usable-memory-range".
Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low
memory region.
- Fix kdump broken with ZONE_DMA reintroduced.
- Update chosen schema.

Changes since [v7]
- Move x86 CRASH_ALIGN to 2M
Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
- Update Documentation/devicetree/bindings/chosen.txt.
Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt
suggested by Arnd.
- Add Tested-by from Jhon and pk.

Changes since [v6]
- Fix build errors reported by kbuild test robot.

Changes since [v5]
- Move reserve_crashkernel_low() into kernel/crash_core.c.
- Delete crashkernel=X,high.
- Modify crashkernel=X,low.
If crashkernel=X,low is specified simultaneously, reserve spcified size low
memory for crash kdump kernel devices firstly and then reserve memory above 4G.
In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
pass to crash dump kernel by DT property "linux,low-memory-range".
- Update Documentation/admin-guide/kdump/kdump.rst.

Changes since [v4]
- Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.

Changes since [v3]
- Add memblock_cap_memory_ranges back for multiple ranges.
- Fix some compiling warnings.

Changes since [v2]
- Split patch "arm64: kdump: support reserving crashkernel above 4G" as
two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
patch.

Changes since [v1]:
- Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
- Remove memblock_cap_memory_ranges() i added in v1 and implement that
in fdt_enforce_memory_region().
There are at most two crash kernel regions, for two crash kernel regions
case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
and then remove the memory range in the middle.

[1]: http://lists.infradead.org/pipermail/kexec/2020-June/020737.html
[2]: https://github.com/robherring/dt-schema/pull/19 
[v1]: https://lkml.org/lkml/2019/4/2/1174
[v2]: https://lkml.org/lkml/2019/4/9/86
[v3]: https://lkml.org/lkml/2019/4/9/306
[v4]: https://lkml.org/lkml/2019/4/15/273
[v5]: https://lkml.org/lkml/2019/5/6/1360
[v6]: https://lkml.org/lkml/2019/8/30/142
[v7]: https://lkml.org/lkml/2019/12/23/411
[v8]: https://lkml.org/lkml/2020/5/21/213
[v9]: https://lkml.org/lkml/2020/6/28/73
[v10]: https://lkml.org/lkml/2020/7/2/1443
[v11]: https://lkml.org/lkml/2020/8/1/150
[v12]: https://lkml.org/lkml/2020/9/7/1037
[v13]: https://lkml.org/lkml/2020/10/31/34
[v14]: https://lkml.org/lkml/2021/1/30/53
[v15]: https://lkml.org/lkml/2021/10/19/1405
[v16]: https://lkml.org/lkml/2021/11/23/435
[v17}: https://lkml.org/lkml/2021/12/10/38

Chen Zhou (9):
  x86/setup: Move CRASH_ALIGN and CRASH_ADDR_{LOW|HIGH}_MAX to
    asm/kexec.h
  x86/setup: Move xen_pv_domain() check and insert_resource() to
    setup_arch()
  x86/setup: Eliminate a magic number in reserve_crashkernel()
  x86/setup: Add build option ARCH_WANT_RESERVE_CRASH_KERNEL
  x86/setup: Move reserve_crashkernel[_low]() into crash_core.c
  arm64: kdump: introduce some macros for crash kernel reservation
  arm64: kdump: reimplement crashkernel=X
  of: fdt: Add memory for devices by DT property
    "linux,usable-memory-range"
  kdump: update Documentation about crashkernel

Zhen Lei (8):
  x86/setup: Adjust the range of codes separated by CONFIG_X86_64
  x86/setup: Add helper parse_crashkernel_in_order()
  x86/setup: Use parse_crashkernel_in_order() to make code logic clear
  x86/setup: Update comments in reserve_crashkernel()
  kdump: Simplify the parameters of __parse_crashkernel()
  kdump: Make parse_crashkernel_{high|low} static
  kdump: Reduce unused parameters of parse_crashkernel_{high|low}
  of: fdt: Aggregate the processing of "linux,usable-memory-range"

 Documentation/admin-guide/kdump/kdump.rst     |  11 +-
 .../admin-guide/kernel-parameters.txt         |  13 +-
 arch/Kconfig                                  |   3 +
 arch/arm64/Kconfig                            |   1 +
 arch/arm64/include/asm/kexec.h                |  10 +
 arch/arm64/kernel/machine_kexec_file.c        |  12 +-
 arch/arm64/kernel/setup.c                     |  13 +-
 arch/arm64/mm/init.c                          |  59 +----
 arch/x86/Kconfig                              |   2 +
 arch/x86/include/asm/kexec.h                  |  28 +++
 arch/x86/kernel/setup.c                       | 166 +-------------
 drivers/of/fdt.c                              |  42 +++-
 include/linux/crash_core.h                    |   4 -
 kernel/crash_core.c                           | 207 ++++++++++++++++--
 14 files changed, 328 insertions(+), 243 deletions(-)

Comments

Borislav Petkov Dec. 22, 2021, 8:43 p.m. UTC | #1
On Wed, Dec 22, 2021 at 09:08:04PM +0800, Zhen Lei wrote:
> From: Chen Zhou <chenzhou10@huawei.com>
> 
> We want to make function reserve_crashkernel[_low](), which is implemented
  ^^

Please use passive voice in your commit message: no "we" or "I", etc,
and describe your changes in imperative mood.

Also, pls read section "2) Describe your changes" in
Documentation/process/submitting-patches.rst for more details.

Bottom line is: personal pronouns are ambiguous in text, especially with
so many parties/companies/etc developing the kernel so let's avoid them
please.

> by X86, available to other architectures. It references macro CRASH_ALIGN

"x86"

> and will be moved to public crash_core.c. But the defined values of
> CRASH_ALIGN may be different in different architectures. So moving the
> definition of CRASH_ALIGN to asm/kexec.h is a good choice.
> 
> The reason for moving CRASH_ADDR_{LOW|HIGH}_MAX is the same as above.

This commit message needs to say something along the lines of:

"Move CRASH_ALIGN and ... to the arch-specific header in preparation
of making reserve_crashkernel[_low]() generic, used by other
architectures."

or so.
Leizhen (ThunderTown) Dec. 23, 2021, 2:09 a.m. UTC | #2
On 2021/12/23 4:43, Borislav Petkov wrote:
> On Wed, Dec 22, 2021 at 09:08:04PM +0800, Zhen Lei wrote:
>> From: Chen Zhou <chenzhou10@huawei.com>
>>
>> We want to make function reserve_crashkernel[_low](), which is implemented
>   ^^
> 
> Please use passive voice in your commit message: no "we" or "I", etc,
> and describe your changes in imperative mood.

My bad language habits. I've made this mistake several times.

> 
> Also, pls read section "2) Describe your changes" in
> Documentation/process/submitting-patches.rst for more details.
> 
> Bottom line is: personal pronouns are ambiguous in text, especially with
> so many parties/companies/etc developing the kernel so let's avoid them
> please.

OK, I'll check the description of the other patches.

> 
>> by X86, available to other architectures. It references macro CRASH_ALIGN
> 
> "x86"

OK

> 
>> and will be moved to public crash_core.c. But the defined values of
>> CRASH_ALIGN may be different in different architectures. So moving the
>> definition of CRASH_ALIGN to asm/kexec.h is a good choice.
>>
>> The reason for moving CRASH_ADDR_{LOW|HIGH}_MAX is the same as above.
> 
> This commit message needs to say something along the lines of:
> 
> "Move CRASH_ALIGN and ... to the arch-specific header in preparation
> of making reserve_crashkernel[_low]() generic, used by other
> architectures."

OK, I will use this one, thanks.

By the way, patch 0004-0006 were written based on your suggestion. Can you
take a moment to review it? I think I forgot to add "Suggested-by".

> 
> or so.
>
Borislav Petkov Dec. 23, 2021, 5:26 p.m. UTC | #3
On Wed, Dec 22, 2021 at 09:08:05PM +0800, Zhen Lei wrote:
> From: Chen Zhou <chenzhou10@huawei.com>
> 
> We will make the functions reserve_crashkernel() as generic, the
> xen_pv_domain() check in reserve_crashkernel() is relevant only to
> x86,

Why is that so? Is Xen-PV x86-only?

> the same as insert_resource() in reserve_crashkernel[_low]().

Why?

Looking at

  0212f9159694 ("x86: Add Crash kernel low reservation")

it *surprisingly* explains why that resources thing is being added:

    We need to add another range in /proc/iomem like "Crash kernel low",
    so kexec-tools could find that info and append to kdump kernel
    command line.

Then,

  157752d84f5d ("kexec: use Crash kernel for Crash kernel low")

renamed it because, as it states, kexec-tools was taught to handle
multiple resources of the same name.

So why does kexec-tools on arm *not* need those iomem resources? How
does it parse the ranges there? Questions over questions...

So last time I told you to sit down and take your time with this cleanup.
From reading this here, it doesn't look like it. Rather, it looks like
hastily done in a hurry and hurrying stuff doesn't help you one bit - it
actually makes it worse.

Your commit messages need to explain *why* a change is being done and
why is that ok. This one doesn't.

> @@ -1120,7 +1109,17 @@ void __init setup_arch(char **cmdline_p)
>  	 * Reserve memory for crash kernel after SRAT is parsed so that it
>  	 * won't consume hotpluggable memory.
>  	 */
> -	reserve_crashkernel();
> +#ifdef CONFIG_KEXEC_CORE
> +	if (xen_pv_domain())
> +		pr_info("Ignoring crashkernel for a Xen PV domain\n");

This is wrong - the check is currently being done inside
reserve_crashkernel(), *after* it has parsed a crashkernel= cmdline
correctly - and not before.

Your change would print on Xen PV, regardless of whether it has received
crashkernel= on the cmdline or not.

This is exactly why I say that making those functions generic and shared
might not be such a good idea, after all, because then you'd have to
sprinkle around arch-specific stuff.

One of the ways how to address this particular case here would be:

1. Add a x86-specific wrapper around parse_crashkernel() which does
all the parsing. When that wrapper finishes, you should have parsed
everything that has crashkernel= on the cmdline.

2. At the end of that wrapper, you do arch-specific checks and setup
like the xen_pv_domain() one.

3. Now, you do reserve_crashkernel(), if those checks pass.

The question is, whether the flow on arm64 can do the same. Probably but
it needs careful auditing.
Leizhen (ThunderTown) Dec. 24, 2021, 6:36 a.m. UTC | #4
On 2021/12/24 1:26, Borislav Petkov wrote:
> On Wed, Dec 22, 2021 at 09:08:05PM +0800, Zhen Lei wrote:
>> From: Chen Zhou <chenzhou10@huawei.com>
>>
>> We will make the functions reserve_crashkernel() as generic, the
>> xen_pv_domain() check in reserve_crashkernel() is relevant only to
>> x86,
> 
> Why is that so? Is Xen-PV x86-only?
> 
>> the same as insert_resource() in reserve_crashkernel[_low]().
> 
> Why?
> 
> Looking at
> 
>   0212f9159694 ("x86: Add Crash kernel low reservation")
> 
> it *surprisingly* explains why that resources thing is being added:
> 
>     We need to add another range in /proc/iomem like "Crash kernel low",
>     so kexec-tools could find that info and append to kdump kernel
>     command line.
> 
> Then,
> 
>   157752d84f5d ("kexec: use Crash kernel for Crash kernel low")
> 
> renamed it because, as it states, kexec-tools was taught to handle
> multiple resources of the same name.
> 
> So why does kexec-tools on arm *not* need those iomem resources? How
> does it parse the ranges there? Questions over questions...

https://lkml.org/lkml/2019/4/4/1758

Chen Zhou has explained before, see below. I'll analyze why x86 and arm64 need
to process iomem resources at different times.

 < This very reminds what x86 does. Any chance some of the code can be reused
 < rather than duplicated?
As i said in the comment, i transport reserve_crashkernel_low() from x86_64. There are minor
differences. In arm64, we don't need to do insert_resource(), we do request_resource()
in request_standard_resources() later.

> 
> So last time I told you to sit down and take your time with this cleanup.
>>From reading this here, it doesn't look like it. Rather, it looks like
> hastily done in a hurry and hurrying stuff doesn't help you one bit - it
> actually makes it worse.
> 
> Your commit messages need to explain *why* a change is being done and
> why is that ok. This one doesn't.

OK, I'll do this in follow-up patches.

> 
>> @@ -1120,7 +1109,17 @@ void __init setup_arch(char **cmdline_p)
>>  	 * Reserve memory for crash kernel after SRAT is parsed so that it
>>  	 * won't consume hotpluggable memory.
>>  	 */
>> -	reserve_crashkernel();
>> +#ifdef CONFIG_KEXEC_CORE
>> +	if (xen_pv_domain())
>> +		pr_info("Ignoring crashkernel for a Xen PV domain\n");
> 
> This is wrong - the check is currently being done inside
> reserve_crashkernel(), *after* it has parsed a crashkernel= cmdline
> correctly - and not before.
> 
> Your change would print on Xen PV, regardless of whether it has received
> crashkernel= on the cmdline or not.

Yes, you're right. There are changes in code logic, but the print doesn't
seem to cause any misunderstanding.

> 
> This is exactly why I say that making those functions generic and shared
> might not be such a good idea, after all, because then you'd have to
> sprinkle around arch-specific stuff.

Yes, I'm thinking about that too. Perhaps they are not suitable for full
code sharing, but it looks like there's some code that can be shared.
For example, the function parse_crashkernel_in_order() that I extracted
based on your suggestion, it could also be parse_crashkernel_high_low().
Or the function reserve_crashkernel_low().

There are two ways to reserve memory above 4G:
1. Use crashkernel=X,high, with or without crashkernel=X,low
2. Use crashkernel=X,[offset], but try low memory first. If failed, then
   try high memory, and retry at least 256M low memory.

I plan to only implement 2 in the next version so that there can be fewer
changes. Then implement 1 after 2 is applied.

> 
> One of the ways how to address this particular case here would be:
> 
> 1. Add a x86-specific wrapper around parse_crashkernel() which does
> all the parsing. When that wrapper finishes, you should have parsed
> everything that has crashkernel= on the cmdline.
> 
> 2. At the end of that wrapper, you do arch-specific checks and setup
> like the xen_pv_domain() one.
> 
> 3. Now, you do reserve_crashkernel(), if those checks pass.
> 
> The question is, whether the flow on arm64 can do the same. Probably but
> it needs careful auditing.
>
Leizhen (ThunderTown) Dec. 25, 2021, 1:53 a.m. UTC | #5
On 2021/12/24 14:36, Leizhen (ThunderTown) wrote:
> 
> 
> On 2021/12/24 1:26, Borislav Petkov wrote:
>> On Wed, Dec 22, 2021 at 09:08:05PM +0800, Zhen Lei wrote:
>>> From: Chen Zhou <chenzhou10@huawei.com>
>>>
>>> We will make the functions reserve_crashkernel() as generic, the
>>> xen_pv_domain() check in reserve_crashkernel() is relevant only to
>>> x86,
>>
>> Why is that so? Is Xen-PV x86-only?
>>
>>> the same as insert_resource() in reserve_crashkernel[_low]().
>>
>> Why?
>>
>> Looking at
>>
>>   0212f9159694 ("x86: Add Crash kernel low reservation")
>>
>> it *surprisingly* explains why that resources thing is being added:
>>
>>     We need to add another range in /proc/iomem like "Crash kernel low",
>>     so kexec-tools could find that info and append to kdump kernel
>>     command line.
>>
>> Then,
>>
>>   157752d84f5d ("kexec: use Crash kernel for Crash kernel low")
>>
>> renamed it because, as it states, kexec-tools was taught to handle
>> multiple resources of the same name.
>>
>> So why does kexec-tools on arm *not* need those iomem resources? How
>> does it parse the ranges there? Questions over questions...

It's a good question worth figuring out. I'm going to dig into this.
I admire your rigorous style and sharp vision.

> 
> https://lkml.org/lkml/2019/4/4/1758
> 
> Chen Zhou has explained before, see below. I'll analyze why x86 and arm64 need
> to process iomem resources at different times.
> 
>  < This very reminds what x86 does. Any chance some of the code can be reused
>  < rather than duplicated?
> As i said in the comment, i transport reserve_crashkernel_low() from x86_64. There are minor
> differences. In arm64, we don't need to do insert_resource(), we do request_resource()
> in request_standard_resources() later.
> 
>>
>> So last time I told you to sit down and take your time with this cleanup.
>> >From reading this here, it doesn't look like it. Rather, it looks like
>> hastily done in a hurry and hurrying stuff doesn't help you one bit - it
>> actually makes it worse.
>>
>> Your commit messages need to explain *why* a change is being done and
>> why is that ok. This one doesn't.
> 
> OK, I'll do this in follow-up patches.
> 
>>
>>> @@ -1120,7 +1109,17 @@ void __init setup_arch(char **cmdline_p)
>>>  	 * Reserve memory for crash kernel after SRAT is parsed so that it
>>>  	 * won't consume hotpluggable memory.
>>>  	 */
>>> -	reserve_crashkernel();
>>> +#ifdef CONFIG_KEXEC_CORE
>>> +	if (xen_pv_domain())
>>> +		pr_info("Ignoring crashkernel for a Xen PV domain\n");

Right, these two lines of code do not need to be moved. xen_pv_domain() is
a friendly macro function.

>>
>> This is wrong - the check is currently being done inside
>> reserve_crashkernel(), *after* it has parsed a crashkernel= cmdline
>> correctly - and not before.
>>
>> Your change would print on Xen PV, regardless of whether it has received
>> crashkernel= on the cmdline or not.
> 
> Yes, you're right. There are changes in code logic, but the print doesn't
> seem to cause any misunderstanding.
> 
>>
>> This is exactly why I say that making those functions generic and shared
>> might not be such a good idea, after all, because then you'd have to
>> sprinkle around arch-specific stuff.
> 
> Yes, I'm thinking about that too. Perhaps they are not suitable for full
> code sharing, but it looks like there's some code that can be shared.
> For example, the function parse_crashkernel_in_order() that I extracted
> based on your suggestion, it could also be parse_crashkernel_high_low().
> Or the function reserve_crashkernel_low().
> 
> There are two ways to reserve memory above 4G:
> 1. Use crashkernel=X,high, with or without crashkernel=X,low
> 2. Use crashkernel=X,[offset], but try low memory first. If failed, then
>    try high memory, and retry at least 256M low memory.
> 
> I plan to only implement 2 in the next version so that there can be fewer
> changes. Then implement 1 after 2 is applied.

I tried it yesterday and it didn't work. I still have to deal with the
problem of adjusting insert_resource().

How about I isolate some cleanup patches first? Strive for them to be
merged into v5.17. This way, we can focus on the core changes in the
next version. And I can also save some repetitive rebase workload.

> 
>>
>> One of the ways how to address this particular case here would be:
>>
>> 1. Add a x86-specific wrapper around parse_crashkernel() which does
>> all the parsing. When that wrapper finishes, you should have parsed
>> everything that has crashkernel= on the cmdline.
>>
>> 2. At the end of that wrapper, you do arch-specific checks and setup
>> like the xen_pv_domain() one.
>>
>> 3. Now, you do reserve_crashkernel(), if those checks pass.
>>
>> The question is, whether the flow on arm64 can do the same. Probably but
>> it needs careful auditing.
>>
Leizhen (ThunderTown) Dec. 25, 2021, 1:58 a.m. UTC | #6
On 2021/12/22 21:08, Zhen Lei wrote:
> Currently, there are two possible combinations of configurations.
> (1) crashkernel=X[@offset]
> (2) crashkernel=X,high, with or without crashkernel=X,low
> 
> (1) has the highest priority, if it is configured correctly, (2) will be
> ignored. Similarly, in combination (2), crashkernel=X,low is valid only
> when crashkernel=X,high is valid.
> 
> Putting the operations of parsing all "crashkernel=" configurations in one
> function helps to sort out the strong dependency.
> 
> So add helper parse_crashkernel_in_order(). The "__maybe_unused" will be
> removed in the next patch.
> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>  arch/x86/kernel/setup.c | 51 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 51 insertions(+)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index d9080bfa131a654..f997074d36f2484 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -439,6 +439,57 @@ static int __init reserve_crashkernel_low(void)
>  }
>  #endif
>  
> +#define CRASHKERNEL_MEM_NONE		0x0	/* crashkernel= is not exist or invalid */
> +#define CRASHKERNEL_MEM_CLASSIC		0x1	/* crashkernel=X[@offset] is valid */
> +#define CRASHKERNEL_MEM_HIGH		0x2	/* crashkernel=X,high is valid */
> +#define CRASHKERNEL_MEM_LOW		0x4	/* crashkernel=X,low is valid */
> +
> +/**
> + * parse_crashkernel_in_order - Parse all "crashkernel=" configurations in
> + *				priority order until a valid combination is found.
> + * @cmdline:	The bootup command line.
> + * @system_ram: Total system memory size.
> + * @crash_size: Save the memory size specified by "crashkernel=X[@offset]" or
> + *		"crashkernel=X,high".
> + * @crash_base: Save the base address specified by "crashkernel=X@offset"
> + * @low_size:	Save the memory size specified by "crashkernel=X,low"
> + *
> + * Returns the status flag of the parsing result of "crashkernel=", such as
> + * CRASHKERNEL_MEM_NONE, CRASHKERNEL_MEM_HIGH.
> + */
> +__maybe_unused
> +static int __init parse_crashkernel_in_order(char *cmdline,
> +					     unsigned long long system_ram,
> +					     unsigned long long *crash_size,
> +					     unsigned long long *crash_base,
> +					     unsigned long long *low_size)

I rethought yesterday that this function name is not self-annotated. In addition,
the meaning of the return value is not mainstream. It would be better to change it
to parse_crashkernel_high_low().

> +{
> +	int ret, flag = CRASHKERNEL_MEM_NONE;
> +
> +	BUG_ON(!crash_size || !crash_base || !low_size);
> +
> +	/* crashkernel=X[@offset] */
> +	ret = parse_crashkernel(cmdline, system_ram, crash_size, crash_base);
> +	if (!ret && crash_size > 0)
> +		return CRASHKERNEL_MEM_CLASSIC;
> +
> +#ifdef CONFIG_X86_64
> +	/* crashkernel=X,high */
> +	ret = parse_crashkernel_high(cmdline, system_ram, crash_size, crash_base);
> +	if (ret || crash_size <= 0)
> +		return CRASHKERNEL_MEM_NONE;
> +
> +	flag = CRASHKERNEL_MEM_HIGH;
> +
> +	/* crashkernel=Y,low */
> +	ret = parse_crashkernel_low(cmdline, system_ram, low_size, crash_base);
> +	if (!ret)
> +		flag |= CRASHKERNEL_MEM_LOW;
> +#endif
> +
> +	return flag;
> +}
> +
>  static void __init reserve_crashkernel(void)
>  {
>  	unsigned long long crash_size, crash_base, total_mem;
>
Leizhen (ThunderTown) Dec. 25, 2021, 10:16 a.m. UTC | #7
On 2021/12/25 9:53, Leizhen (ThunderTown) wrote:
>>> This is exactly why I say that making those functions generic and shared
>>> might not be such a good idea, after all, because then you'd have to
>>> sprinkle around arch-specific stuff.

Hi Borislav and all:
  Merry Christmas!

  I have a new idea now. It helps us get around all the arguments and
minimizes changes to the x86 (also to arm64).
  Previously, Chen Zhou and I tried to share the entire function
reserve_crashkernel(), which led to the following series of problems:
1. reserve_crashkernel() is also defined on other architectures, so we should
   add build option ARCH_WANT_RESERVE_CRASH_KERNEL to avoid conflicts.
2. Move xen_pv_domain() check out of reserve_crashkernel().
3. Move insert_resource() out of reserve_crashkernel()

Others:
4. start = memblock_phys_alloc_range(crash_size, SZ_1M, crash_base,
                                                  crash_base + crash_size);
   Change SZ_1M to CRASH_ALIGN, or keep it no change.
   The current conclusion is no change. But I think adding a new macro
   CRASH_FIXED_ALIGN is also a way. 2M alignment allows page tables to
   use block mappings for most architectures.
5. if (crash_base >= (1ULL << 32) && reserve_crashkernel_low())
   Change (1ULL << 32) to CRASH_ADDR_LOW_MAX, or keep it no change.
   I reanalyzed it, and this doesn't need to be changed.

So for 1-3,why not add a new function reserve_crashkernel_mem() and rename
reserve_crashkernel_low() to reserve_crashkernel_mem_low().
On x86:
static void __init reserve_crashkernel(void)
{
	//Parse all "crashkernel=" configurations in priority order until
        //a valid combination is found. Or return upon failure.
	
	if (xen_pv_domain()) {
                pr_info("Ignoring crashkernel for a Xen PV domain\n");
                return;
        }

	//Call reserve_crashkernel_mem() to reserve crashkernel memory, it will
	//call reserve_crashkernel_mem_low() if needed.

	if (crashk_low_res.end)
		insert_resource(&iomem_resource, &crashk_low_res);
	insert_resource(&iomem_resource, &crashk_res);
}

On arm64:
static void __init reserve_crashkernel(void)
{
	//Parse all "crashkernel=" configurations in priority order until
        //a valid combination is found. Or return upon failure.
	
	//Call reserve_crashkernel_mem() to reserve crashkernel memory, it will
	//call reserve_crashkernel_mem_low() if needed.
}


1. reserve_crashkernel() is still static, so that there is no
   need to add ARCH_WANT_RESERVE_CRASH_KERNEL.
2. The xen_pv_domain() check have not been affected in any way.
   Hi Borislav:
     As you mentioned, this check may also be needed on arm64. But it may be
   better not to add it until the problem is actually triggered on arm64.
3. insert_resource() is not moved outside reserve_crashkernel() on x86.
   Hi Borislav:
     Currently, I haven't figured out why request_resource() can't be replaced
   with insert_resource() on arm64. But I have a hunch that the kexec tool may
   be involved. The cost of modification on arm64 is definitely higher than that
   on x86. Other architectures that want to use reserve_crashkernel_mem() may
   also face the same problem. So it's probably better that function
   reserve_crashkernel_mem() doesn't invoke insert_resource().

I guess you have a long Christmas holiday. So I'm going to send the next version
without waiting for your response.


>> Yes, I'm thinking about that too. Perhaps they are not suitable for full
>> code sharing, but it looks like there's some code that can be shared.
>> For example, the function parse_crashkernel_in_order() that I extracted
>> based on your suggestion, it could also be parse_crashkernel_high_low().
>> Or the function reserve_crashkernel_low().
>>
>> There are two ways to reserve memory above 4G:
>> 1. Use crashkernel=X,high, with or without crashkernel=X,low
>> 2. Use crashkernel=X,[offset], but try low memory first. If failed, then
>>    try high memory, and retry at least 256M low memory.
>>
>> I plan to only implement 2 in the next version so that there can be fewer
>> changes. Then implement 1 after 2 is applied.
> I tried it yesterday and it didn't work. I still have to deal with the
> problem of adjusting insert_resource().
> 
> How about I isolate some cleanup patches first? Strive for them to be
> merged into v5.17. This way, we can focus on the core changes in the
> next version. And I can also save some repetitive rebase workload.
>
Leizhen (ThunderTown) Jan. 7, 2022, 8:13 a.m. UTC | #8
On 2021/12/25 9:53, Leizhen (ThunderTown) wrote:
> 
> 
> On 2021/12/24 14:36, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2021/12/24 1:26, Borislav Petkov wrote:
>>> On Wed, Dec 22, 2021 at 09:08:05PM +0800, Zhen Lei wrote:
>>>> From: Chen Zhou <chenzhou10@huawei.com>
>>>>
>>>> We will make the functions reserve_crashkernel() as generic, the
>>>> xen_pv_domain() check in reserve_crashkernel() is relevant only to
>>>> x86,
>>>
>>> Why is that so? Is Xen-PV x86-only?
>>>
>>>> the same as insert_resource() in reserve_crashkernel[_low]().
>>>
>>> Why?
>>>
>>> Looking at
>>>
>>>   0212f9159694 ("x86: Add Crash kernel low reservation")
>>>
>>> it *surprisingly* explains why that resources thing is being added:
>>>
>>>     We need to add another range in /proc/iomem like "Crash kernel low",
>>>     so kexec-tools could find that info and append to kdump kernel
>>>     command line.
>>>
>>> Then,
>>>
>>>   157752d84f5d ("kexec: use Crash kernel for Crash kernel low")
>>>
>>> renamed it because, as it states, kexec-tools was taught to handle
>>> multiple resources of the same name.
>>>
>>> So why does kexec-tools on arm *not* need those iomem resources? How
>>> does it parse the ranges there? Questions over questions...

Hi Borislav:
  The reason why insert_resource() cannot be used in reserve_crashkernel[_low]()
on arm64 is clear. The parent resource node of crashk[_low]_res is added by
request_resource() in request_standard_resources(), so that it will be conflicted.
All request_resource() in request_standard_resources() should be changed to
insert_resource(), to make insert_resource() can be used in reserve_crashkernel[_low]().

  I found commit e25e6e7593ca ("kdump, x86: Process multiple Crash kernel in /proc/iomem")
in kexec-tools. I'm trying to port it to arm64, or make it generic.

  Thanks.

> 
> It's a good question worth figuring out. I'm going to dig into this.
> I admire your rigorous style and sharp vision.
>
Leizhen (ThunderTown) Jan. 7, 2022, 1:09 p.m. UTC | #9
On 2022/1/7 16:13, Leizhen (ThunderTown) wrote:
> 
> 
> On 2021/12/25 9:53, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2021/12/24 14:36, Leizhen (ThunderTown) wrote:
>>>
>>>
>>> On 2021/12/24 1:26, Borislav Petkov wrote:
>>>> On Wed, Dec 22, 2021 at 09:08:05PM +0800, Zhen Lei wrote:
>>>>> From: Chen Zhou <chenzhou10@huawei.com>
>>>>>
>>>>> We will make the functions reserve_crashkernel() as generic, the
>>>>> xen_pv_domain() check in reserve_crashkernel() is relevant only to
>>>>> x86,
>>>>
>>>> Why is that so? Is Xen-PV x86-only?
>>>>
>>>>> the same as insert_resource() in reserve_crashkernel[_low]().
>>>>
>>>> Why?
>>>>
>>>> Looking at
>>>>
>>>>   0212f9159694 ("x86: Add Crash kernel low reservation")
>>>>
>>>> it *surprisingly* explains why that resources thing is being added:
>>>>
>>>>     We need to add another range in /proc/iomem like "Crash kernel low",
>>>>     so kexec-tools could find that info and append to kdump kernel
>>>>     command line.
>>>>
>>>> Then,
>>>>
>>>>   157752d84f5d ("kexec: use Crash kernel for Crash kernel low")
>>>>
>>>> renamed it because, as it states, kexec-tools was taught to handle
>>>> multiple resources of the same name.
>>>>
>>>> So why does kexec-tools on arm *not* need those iomem resources? How
>>>> does it parse the ranges there? Questions over questions...
> 
> Hi Borislav:
>   The reason why insert_resource() cannot be used in reserve_crashkernel[_low]()
> on arm64 is clear. The parent resource node of crashk[_low]_res is added by
> request_resource() in request_standard_resources(), so that it will be conflicted.
> All request_resource() in request_standard_resources() should be changed to
> insert_resource(), to make insert_resource() can be used in reserve_crashkernel[_low]().
> 
>   I found commit e25e6e7593ca ("kdump, x86: Process multiple Crash kernel in /proc/iomem")
> in kexec-tools. I'm trying to port it to arm64, or make it generic.

Chen Zhou's done it before. But the "Crash kernel (low)" can really be eliminated. Chen
Zhou just used it to distinguish whether the crashkernel memory range is crashkernel load
range or not. We can use get_crash_kernel_load_range() to get and check the load range.


> 
>   Thanks.
> 
>>
>> It's a good question worth figuring out. I'm going to dig into this.
>> I admire your rigorous style and sharp vision.
>>
> 
>