Message ID | 20230312181154.278900-1-sourabhjain@linux.ibm.com (mailing list archive) |
---|---|
Headers | show |
Series | PowerPC: in kernel handling of CPU hotplug events for crash kernel | expand |
On 3/12/23 13:11, Sourabh Jain wrote: > The Problem: > ============ > Post hotplug/DLPAR events the capture kernel holds stale information about the > system. Dump collection with stale capture kernel might end up in dump capture > failure or an inaccurate dump collection. > > Existing solution: > ================== > The existing solution to keep the capture kernel up-to-date by monitoring > hotplug event via udev rule and trigger a full capture kernel reload for > every hotplug event. > > Shortcomings: > ------------------------------------------------ > - Leaves a window where kernel crash might not lead to a successful dump > collection. > - Reloading all kexec components for each hotplug is inefficient. > - udev rules are prone to races if hotplug events are frequent. > > More about issues with an existing solution is posted here: > - https://lkml.org/lkml/2020/12/14/532 > - https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-February/240254.html > > Proposed Solution: > ================== > Instead of reloading all kexec segments on hotplug event, this patch series > focuses on updating only the relevant kexec segment. Once the kexec segments > are loaded in the kernel reserved area then an arch-specific hotplug handler > will update the relevant kexec segment based on hotplug event type. > > Series Dependecies > ================== > This patch series implements the crash hotplug handler on PowerPC. The generic > crash hotplug update is introduced by https://lkml.org/lkml/2023/3/6/1358 patch > series. > > Git tree for testing: > ===================== > The below git tree has this patch series applied on top of dependent patch > series. > https://github.com/sourabhjains/linux/tree/in-kernel-crash-update-v9 > > To realise the feature the kdump udev rules must be disabled for CPU/Memory > hotplug events. Comment out the below line in kdump udev rule file: > > RHEL: /usr/lib/udev/rules.d/98-kexec.rules > > #SUBSYSTEM=="cpu", ACTION=="online", GOTO="kdump_reload_cpu" > #SUBSYSTEM=="memory", ACTION=="online", GOTO="kdump_reload_mem" > #SUBSYSTEM=="memory", ACTION=="offline", GOTO="kdump_reload_mem" > > SLES: /usr/lib/kdump/70-kdump.rules > > #SUBSYSTEM=="memory", ACTION=="add|remove", GOTO="kdump_try_restart" > #SUBSYSTEM=="cpu", ACTION=="online", GOTO="kdump_try_restart" > Sourabh, The above seems to contradict what I anticipate to be udev rules changes once the base series is accepted. Specifically I'm suggesting the following: - Prevent udev from updating kdump crash kernel on hot un/plug changes. Add the following as the first lines to the RHEL udev rule file /usr/lib/udev/rules.d/98-kexec.rules: # The kernel handles updates to crash elfcorehdr for cpu and memory changes SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" With this changeset applied, the two rules evaluate to false for cpu and memory change events and thus skip the userspace unload-then-reload of kdump. The above additions allow distros to deploy the udev rule immediately and work properly even if the base patchset isn't yet merged, or down the road, enabled/configured. Am I missing something such that your recommendation is different than mine? > Note: only kexec_file_load syscall will work. For kexec_load minor > changes are required in kexec tool. Will this be the same/similar change as I have posted, or do you envision something different? Thanks, eric > > --- > Changelog: > > v9: > - Removed patch to prepare elfcorehdr crash notes for possible CPUs. > The patch is moved to generic patch series that introduces generic > infrastructure for in kernel crash update. > - Removed patch to pass the hotplug action type to the arch crash > hotplug handler function. The generic patch series has introduced > the hotplug action type in kimage struct. > - Add detail commit message for better understanding. > > v8: > - Restrict fdt_index initialization to machine_kexec_post_load > it work for both kexec_load and kexec_file_load.[3/8] Laurent Dufour > > - Updated the logic to find the number of offline core. [6/8] > > - Changed the logic to find the elfcore program header to accommodate > future memory ranges due memory hotplug events. [8/8] > > v7 > - added a new config to configure this feature > - pass hotplug action type to arch specific handler > > v6 > - Added crash memory hotplug support > > v5: > - Replace COFNIG_CRASH_HOTPLUG with CONFIG_HOTPLUG_CPU. > - Move fdt segment identification for kexec_load case to load path > instead of crash hotplug handler > - Keep new attribute defined under kimage_arch to track FDT segment > under CONFIG_HOTPLUG_CPU config. > > v4: > - Update the logic to find the additional space needed for hotadd CPUs post > kexec load. Refer "[RFC v4 PATCH 4/5] powerpc/crash hp: add crash hotplug > support for kexec_file_load" patch to know more about the change. > - Fix a couple of typo. > - Replace pr_err to pr_info_once to warn user about memory hotplug > support. > - In crash hotplug handle exit the for loop if FDT segment is found. > > v3 > - Move fdt_index and fdt_index_vaild variables to kimage_arch struct. > - Rebase patche on top of https://lkml.org/lkml/2022/3/3/674 [v5] > - Fixed warning reported by checpatch script > > v2: > - Use generic hotplug handler introduced by https://lkml.org/lkml/2022/2/9/1406, a > significant change from v1. > > Sourabh Jain (6): > powerpc/kexec: turn some static helper functions public > powerpc/crash: introduce a new config option CRASH_HOTPLUG > powerpc/crash: add a new member to the kimage_arch struct > powerpc/crash: add crash CPU hotplug support > crash: forward memory_notify args to arch crash hotplug handler > powerpc/kexec: add crash memory hotplug support > > arch/powerpc/Kconfig | 12 + > arch/powerpc/include/asm/kexec.h | 15 ++ > arch/powerpc/include/asm/kexec_ranges.h | 1 + > arch/powerpc/kexec/core_64.c | 322 ++++++++++++++++++++++++ > arch/powerpc/kexec/elf_64.c | 13 +- > arch/powerpc/kexec/file_load_64.c | 212 ++++------------ > arch/powerpc/kexec/ranges.c | 85 +++++++ > arch/x86/include/asm/kexec.h | 2 +- > arch/x86/kernel/crash.c | 3 +- > include/linux/kexec.h | 2 +- > kernel/crash_core.c | 14 +- > 11 files changed, 506 insertions(+), 175 deletions(-) >
Hello Eric, On 13/03/23 21:12, Eric DeVolder wrote: > > > On 3/12/23 13:11, Sourabh Jain wrote: >> The Problem: >> ============ >> Post hotplug/DLPAR events the capture kernel holds stale information >> about the >> system. Dump collection with stale capture kernel might end up in >> dump capture >> failure or an inaccurate dump collection. >> >> Existing solution: >> ================== >> The existing solution to keep the capture kernel up-to-date by >> monitoring >> hotplug event via udev rule and trigger a full capture kernel reload for >> every hotplug event. >> >> Shortcomings: >> ------------------------------------------------ >> - Leaves a window where kernel crash might not lead to a successful dump >> collection. >> - Reloading all kexec components for each hotplug is inefficient. >> - udev rules are prone to races if hotplug events are frequent. >> >> More about issues with an existing solution is posted here: >> - https://lkml.org/lkml/2020/12/14/532 >> - >> https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-February/240254.html >> >> Proposed Solution: >> ================== >> Instead of reloading all kexec segments on hotplug event, this patch >> series >> focuses on updating only the relevant kexec segment. Once the kexec >> segments >> are loaded in the kernel reserved area then an arch-specific hotplug >> handler >> will update the relevant kexec segment based on hotplug event type. >> >> Series Dependecies >> ================== >> This patch series implements the crash hotplug handler on PowerPC. >> The generic >> crash hotplug update is introduced by >> https://lkml.org/lkml/2023/3/6/1358 patch >> series. >> >> Git tree for testing: >> ===================== >> The below git tree has this patch series applied on top of dependent >> patch >> series. >> https://github.com/sourabhjains/linux/tree/in-kernel-crash-update-v9 >> >> To realise the feature the kdump udev rules must be disabled for >> CPU/Memory >> hotplug events. Comment out the below line in kdump udev rule file: >> >> RHEL: /usr/lib/udev/rules.d/98-kexec.rules >> >> #SUBSYSTEM=="cpu", ACTION=="online", GOTO="kdump_reload_cpu" >> #SUBSYSTEM=="memory", ACTION=="online", GOTO="kdump_reload_mem" >> #SUBSYSTEM=="memory", ACTION=="offline", GOTO="kdump_reload_mem" >> >> SLES: /usr/lib/kdump/70-kdump.rules >> >> #SUBSYSTEM=="memory", ACTION=="add|remove", GOTO="kdump_try_restart" >> #SUBSYSTEM=="cpu", ACTION=="online", GOTO="kdump_try_restart" >> > > Sourabh, > > The above seems to contradict what I anticipate to be udev rules > changes once the base series is accepted. Specifically I'm suggesting > the following: > > - Prevent udev from updating kdump crash kernel on hot un/plug changes. > Add the following as the first lines to the RHEL udev rule file > /usr/lib/udev/rules.d/98-kexec.rules: > > # The kernel handles updates to crash elfcorehdr for cpu and memory > changes > SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" > SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", > GOTO="kdump_reload_end" > > With this changeset applied, the two rules evaluate to false for > cpu and memory change events and thus skip the userspace > unload-then-reload of kdump. > > The above additions allow distros to deploy the udev rule immediately > and work properly even if the base patchset isn't yet merged, or down > the road, enabled/configured. > > Am I missing something such that your recommendation is different than > mine? ] It is just for the test I have been suggesting to disable the udev rules, but your udev rules changes is the way forward. I will use the above changes to control kdump service reload. >> Note: only kexec_file_load syscall will work. For kexec_load minor >> changes are required in kexec tool. > > Will this be the same/similar change as I have posted, or do you > envision something different? I think the generic changes will be same. I might need to add some PowerPC specific changes to make sure elfcorehdr and FDT kexec segment should have additional buffer space to accommodate additional memory ranges. Thanks for the suggestion, I will align the PowerPC kexec tool changes with your changes. - Souarbh