Message ID | 20230504151100.v4.10.I00dfd6386ee00da25bf26d140559a41339b53e57@changeid (mailing list archive) |
---|---|
State | Handled Elsewhere, archived |
Headers | show |
Series | watchdog/hardlockup: Add the buddy hardlockup detector | expand |
On Thu 2023-05-04 15:13:42, Douglas Anderson wrote: > In preparation for the buddy hardlockup detector, which wants the same > petting logic as the current perf hardlockup detector, move the code > to watchdog.c. While doing this, rename the global variable to match > others nearby. The arch_touch_nmi_watchdog() function is not renamed > since that is exported with "EXPORT_SYMBOL" and is thus ABI. > > Currently the code in watchdog.c is guarded by > CONFIG_HARDLOCKUP_DETECTOR_PERF, which makes this change seem > silly. However, a future patch will change this. > > NOTE: there is a slight side effect to this change, though from code > analysis I believe it to be a beneficial one. Specifically the perf > hardlockup detector will now do check the "timestamp" before clearing > any watchdog pets. Previously the order was reversed. With the old > order if the watchdog perf event was firing super fast then it would > also be clearing existing watchdog pets super fast. The new behavior > of handling super-fast perf before clearing watchdog pets seems > better. Ah, I think that this was actually pretty serious bug in the perf detector. But I think that it should work another way, see below. > Signed-off-by: Douglas Anderson <dianders@chromium.org> > --- > > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -111,6 +125,11 @@ static void watchdog_hardlockup_interrupt_count(void) > > void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) > { > + if (__this_cpu_read(watchdog_hardlockup_touch)) { > + __this_cpu_write(watchdog_hardlockup_touch, false); > + return; > + } If we clear watchdog_hardlockup_touch() here then watchdog_hardlockup_check() won't be called yet another watchdog_hrtimer_sample_threshold perior. It means that any touch will cause ignoring one full period. The is_hardlockup() check will be done after full two periods. It is not ideal, see below. > + > /* > * Check for a hardlockup by making sure the CPU's timer > * interrupt is incrementing. The timer interrupt should have > diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c > index 9be90b2a2ea7..547917ebd5d3 100644 > --- a/kernel/watchdog_perf.c > +++ b/kernel/watchdog_perf.c > @@ -112,11 +98,6 @@ static void watchdog_overflow_callback(struct perf_event *event, > /* Ensure the watchdog never gets throttled */ > event->hw.interrupts = 0; > > - if (__this_cpu_read(watchdog_nmi_touch) == true) { > - __this_cpu_write(watchdog_nmi_touch, false); > - return; > - } The original code looks wrong. arch_touch_nmi_watchdog() caused skipping only one period of the perf event. I would expect that it caused restarting the period, something like: if (__this_cpu_read(watchdog_nmi_touch) == true) { /* * Restart the period after which the interrupt * counter is checked. */ __this_cpu_write(nmi_rearmed, 0); __this_cpu_write(last_timestamp, now); __this_cpu_write(watchdog_nmi_touch, false); return; } By other words, we should restart the period in the very next perf event after the watchdog was touched. That said, the new code looks better than the original. IMHO, the original code was prone to false positives. Best Regards, Petr PS: It might be worth fixing this problem in a separate patch at the beginning of this patchset. It might be a candidate for stable backports. > - > if (!watchdog_check_timestamp()) > return; >
Hi, On Thu, May 11, 2023 at 8:46 AM Petr Mladek <pmladek@suse.com> wrote: > > > @@ -111,6 +125,11 @@ static void watchdog_hardlockup_interrupt_count(void) > > > > void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) > > { > > + if (__this_cpu_read(watchdog_hardlockup_touch)) { > > + __this_cpu_write(watchdog_hardlockup_touch, false); > > + return; > > + } > > If we clear watchdog_hardlockup_touch() here then > watchdog_hardlockup_check() won't be called yet another > watchdog_hrtimer_sample_threshold perior. > > It means that any touch will cause ignoring one full period. > The is_hardlockup() check will be done after full two periods. > > It is not ideal, see below. > > > + > > /* > > * Check for a hardlockup by making sure the CPU's timer > > * interrupt is incrementing. The timer interrupt should have > > diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c > > index 9be90b2a2ea7..547917ebd5d3 100644 > > --- a/kernel/watchdog_perf.c > > +++ b/kernel/watchdog_perf.c > > @@ -112,11 +98,6 @@ static void watchdog_overflow_callback(struct perf_event *event, > > /* Ensure the watchdog never gets throttled */ > > event->hw.interrupts = 0; > > > > - if (__this_cpu_read(watchdog_nmi_touch) == true) { > > - __this_cpu_write(watchdog_nmi_touch, false); > > - return; > > - } > > The original code looks wrong. arch_touch_nmi_watchdog() caused > skipping only one period of the perf event. > > I would expect that it caused restarting the period, > something like: > > if (__this_cpu_read(watchdog_nmi_touch) == true) { > /* > * Restart the period after which the interrupt > * counter is checked. > */ > __this_cpu_write(nmi_rearmed, 0); > __this_cpu_write(last_timestamp, now); > __this_cpu_write(watchdog_nmi_touch, false); > return; > } > > By other words, we should restart the period in the very next perf > event after the watchdog was touched. > > That said, the new code looks better than the original. > IMHO, the original code was prone to false positives. I had a little bit of a hard time following, but I _think_ the "tl;dr" of all the above is that my change is fine. If I misunderstood, please yell. > Best Regards, > Petr > > PS: It might be worth fixing this problem in a separate patch at the > beginning of this patchset. It might be a candidate for stable > backports. Done. It's now its own patch and early in the series.
diff --git a/include/linux/nmi.h b/include/linux/nmi.h index 2c9ea1ba285c..94ff84e1c8ec 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -88,7 +88,10 @@ static inline void hardlockup_detector_disable(void) {} #endif #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) +void arch_touch_nmi_watchdog(void); void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs); +#elif !defined(CONFIG_HAVE_NMI_WATCHDOG) +static inline void arch_touch_nmi_watchdog(void) { } #endif #if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR) @@ -98,7 +101,6 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs); #endif #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) -extern void arch_touch_nmi_watchdog(void); extern void hardlockup_detector_perf_stop(void); extern void hardlockup_detector_perf_restart(void); extern void hardlockup_detector_perf_disable(void); @@ -113,7 +115,6 @@ static inline void hardlockup_detector_perf_enable(void) { } static inline void hardlockup_detector_perf_cleanup(void) { } # if !defined(CONFIG_HAVE_NMI_WATCHDOG) static inline int hardlockup_detector_perf_init(void) { return -ENODEV; } -static inline void arch_touch_nmi_watchdog(void) {} # else static inline int hardlockup_detector_perf_init(void) { return 0; } # endif diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 367bea0167a5..9c17090611f2 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -90,8 +90,22 @@ __setup("nmi_watchdog=", hardlockup_panic_setup); static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved); static DEFINE_PER_CPU(bool, watchdog_hardlockup_processed); +static DEFINE_PER_CPU(bool, watchdog_hardlockup_touch); static unsigned long watchdog_hardlockup_dumped_stacks; +notrace void arch_touch_nmi_watchdog(void) +{ + /* + * Using __raw here because some code paths have + * preemption enabled. If preemption is enabled + * then interrupts should be enabled too, in which + * case we shouldn't have to worry about the watchdog + * going off. + */ + raw_cpu_write(watchdog_hardlockup_touch, true); +} +EXPORT_SYMBOL(arch_touch_nmi_watchdog); + static bool watchdog_hardlockup_is_lockedup(unsigned int cpu) { unsigned long hrint = per_cpu(hrtimer_interrupts, cpu); @@ -111,6 +125,11 @@ static void watchdog_hardlockup_interrupt_count(void) void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) { + if (__this_cpu_read(watchdog_hardlockup_touch)) { + __this_cpu_write(watchdog_hardlockup_touch, false); + return; + } + /* * Check for a hardlockup by making sure the CPU's timer * interrupt is incrementing. The timer interrupt should have diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c index 9be90b2a2ea7..547917ebd5d3 100644 --- a/kernel/watchdog_perf.c +++ b/kernel/watchdog_perf.c @@ -20,26 +20,12 @@ #include <asm/irq_regs.h> #include <linux/perf_event.h> -static DEFINE_PER_CPU(bool, watchdog_nmi_touch); static DEFINE_PER_CPU(struct perf_event *, watchdog_ev); static DEFINE_PER_CPU(struct perf_event *, dead_event); static struct cpumask dead_events_mask; static atomic_t watchdog_cpus = ATOMIC_INIT(0); -notrace void arch_touch_nmi_watchdog(void) -{ - /* - * Using __raw here because some code paths have - * preemption enabled. If preemption is enabled - * then interrupts should be enabled too, in which - * case we shouldn't have to worry about the watchdog - * going off. - */ - raw_cpu_write(watchdog_nmi_touch, true); -} -EXPORT_SYMBOL(arch_touch_nmi_watchdog); - #ifdef CONFIG_HARDLOCKUP_CHECK_TIMESTAMP static DEFINE_PER_CPU(ktime_t, last_timestamp); static DEFINE_PER_CPU(unsigned int, nmi_rearmed); @@ -112,11 +98,6 @@ static void watchdog_overflow_callback(struct perf_event *event, /* Ensure the watchdog never gets throttled */ event->hw.interrupts = 0; - if (__this_cpu_read(watchdog_nmi_touch) == true) { - __this_cpu_write(watchdog_nmi_touch, false); - return; - } - if (!watchdog_check_timestamp()) return;
In preparation for the buddy hardlockup detector, which wants the same petting logic as the current perf hardlockup detector, move the code to watchdog.c. While doing this, rename the global variable to match others nearby. The arch_touch_nmi_watchdog() function is not renamed since that is exported with "EXPORT_SYMBOL" and is thus ABI. Currently the code in watchdog.c is guarded by CONFIG_HARDLOCKUP_DETECTOR_PERF, which makes this change seem silly. However, a future patch will change this. NOTE: there is a slight side effect to this change, though from code analysis I believe it to be a beneficial one. Specifically the perf hardlockup detector will now do check the "timestamp" before clearing any watchdog pets. Previously the order was reversed. With the old order if the watchdog perf event was firing super fast then it would also be clearing existing watchdog pets super fast. The new behavior of handling super-fast perf before clearing watchdog pets seems better. Signed-off-by: Douglas Anderson <dianders@chromium.org> --- Changes in v4: - ("Move perf hardlockup watchdog petting to watchdog.c") new for v4. include/linux/nmi.h | 5 +++-- kernel/watchdog.c | 19 +++++++++++++++++++ kernel/watchdog_perf.c | 19 ------------------- 3 files changed, 22 insertions(+), 21 deletions(-)