Message ID | 20230504151100.v4.13.I6bf789d21d0c3d75d382e7e51a804a7a51315f2c@changeid (mailing list archive) |
---|---|
State | Handled Elsewhere, archived |
Headers | show |
Series | watchdog/hardlockup: Add the buddy hardlockup detector | expand |
On Fri May 5, 2023 at 8:13 AM AEST, Douglas Anderson wrote: > From: Colin Cross <ccross@android.com> > > Implement a hardlockup detector that doesn't doesn't need any extra > arch-specific support code to detect lockups. Instead of using > something arch-specific we will use the buddy system, where each CPU > watches out for another one. Specifically, each CPU will use its > softlockup hrtimer to check that the next CPU is processing hrtimer > interrupts by verifying that a counter is increasing. Powerpc's watchdog has an SMP checker, did you see it? It's all to all rather than buddy which makes it more complicated but arguably bit better functionality. Thanks, Nick
Hi, On Thu, May 4, 2023 at 7:36 PM Nicholas Piggin <npiggin@gmail.com> wrote: > > On Fri May 5, 2023 at 8:13 AM AEST, Douglas Anderson wrote: > > From: Colin Cross <ccross@android.com> > > > > Implement a hardlockup detector that doesn't doesn't need any extra > > arch-specific support code to detect lockups. Instead of using > > something arch-specific we will use the buddy system, where each CPU > > watches out for another one. Specifically, each CPU will use its > > softlockup hrtimer to check that the next CPU is processing hrtimer > > interrupts by verifying that a counter is increasing. > > Powerpc's watchdog has an SMP checker, did you see it? No, I wasn't aware of it. Interesting, it seems to basically enable both types of hardlockup detectors together. If that really catches more lockups, it seems like we could do the same thing for the buddy system. If people want, I don't think it would be very hard to make the buddy system _not_ exclusive of the perf system. Instead of having the buddy system implement the "weak" functions I could just call the buddy functions in the right places directly and leave the "weak" functions for a more traditional hardlockup detector to implement. Opinions? Maybe after all this lands, the powerpc watchdog could move to use the common code? As evidenced by this patch series, there's not really a reason for the SMP detection to be platform specific. > It's all to > all rather than buddy which makes it more complicated but arguably > bit better functionality. Can you come up with an example crash where the "all to all" would work better than the simple buddy system provided by this patch? It seems like they would be equivalent, but I could be missing something. Specifically they both need at least one non-locked-up CPU to detect a problem. If one or more CPUs is locked up then we'll always detect it. I suppose maybe you could provide a better error message at lockup time saying that several CPUs were locked up and that could be helpful. For now, I'd keep the current buddy system the way it is and if you want to provide a patch improving things to be "all-to-all" in the future that would be interesting to review.
On Sat May 6, 2023 at 2:35 AM AEST, Doug Anderson wrote: > Hi, > > On Thu, May 4, 2023 at 7:36 PM Nicholas Piggin <npiggin@gmail.com> wrote: > > > > On Fri May 5, 2023 at 8:13 AM AEST, Douglas Anderson wrote: > > > From: Colin Cross <ccross@android.com> > > > > > > Implement a hardlockup detector that doesn't doesn't need any extra > > > arch-specific support code to detect lockups. Instead of using > > > something arch-specific we will use the buddy system, where each CPU > > > watches out for another one. Specifically, each CPU will use its > > > softlockup hrtimer to check that the next CPU is processing hrtimer > > > interrupts by verifying that a counter is increasing. > > > > Powerpc's watchdog has an SMP checker, did you see it? > > No, I wasn't aware of it. Interesting, it seems to basically enable > both types of hardlockup detectors together. If that really catches > more lockups, it seems like we could do the same thing for the buddy > system. It doesn't catch more lockups. On powerpc we don't have a reliable periodic NMI hence the SMP checker. But it is preferable that a CPU detects its own lockup because NMI IPIs can result in crashes if they are taken in certain critical sections. > If people want, I don't think it would be very hard to make > the buddy system _not_ exclusive of the perf system. Instead of having > the buddy system implement the "weak" functions I could just call the > buddy functions in the right places directly and leave the "weak" > functions for a more traditional hardlockup detector to implement. > Opinions? > > Maybe after all this lands, the powerpc watchdog could move to use the > common code? As evidenced by this patch series, there's not really a > reason for the SMP detection to be platform specific. The powerpc SMP checker could certainly move to common code if others wanted to use it. > > It's all to > > all rather than buddy which makes it more complicated but arguably > > bit better functionality. > > Can you come up with an example crash where the "all to all" would > work better than the simple buddy system provided by this patch? CPU2 CPU3 spin_lock_irqsave(A) spin_lock_irqsave(B) spin_lock_irqsave(B) spin_lock_irqsave(A) CPU1 will detect the lockup on CPU2, but CPU3's lockup won't be detected so we don't get the trace that can diagnose the bug. Another thing I actually found it useful for is you can easily see if a core (i.e., all threads in the core) or a chip has died. Maybe more useful when doing presilicon and bring up work or firmware hacking, but still useful. Thanks, Nick > It > seems like they would be equivalent, but I could be missing something. > Specifically they both need at least one non-locked-up CPU to detect a > problem. If one or more CPUs is locked up then we'll always detect it. > I suppose maybe you could provide a better error message at lockup > time saying that several CPUs were locked up and that could be > helpful. For now, I'd keep the current buddy system the way it is and > if you want to provide a patch improving things to be "all-to-all" in > the future that would be interesting to review.
Hi, On Sun, May 7, 2023 at 6:05 PM Nicholas Piggin <npiggin@gmail.com> wrote: > > > No, I wasn't aware of it. Interesting, it seems to basically enable > > both types of hardlockup detectors together. If that really catches > > more lockups, it seems like we could do the same thing for the buddy > > system. > > It doesn't catch more lockups. On powerpc we don't have a reliable > periodic NMI hence the SMP checker. But it is preferable that a CPU > detects its own lockup because NMI IPIs can result in crashes if > they are taken in certain critical sections. Ah, interesting. Is the fact that NMI IPIs can crash the system something specific to the way they're implemented in powerpc, or is this something common in Linux? I've been assuming that IPI NMIs would be roughly as reliable (or perhaps more reliable) than the NMI timer/perf counter. > > > It's all to > > > all rather than buddy which makes it more complicated but arguably > > > bit better functionality. > > > > Can you come up with an example crash where the "all to all" would > > work better than the simple buddy system provided by this patch? > > CPU2 CPU3 > spin_lock_irqsave(A) spin_lock_irqsave(B) > spin_lock_irqsave(B) spin_lock_irqsave(A) > > CPU1 will detect the lockup on CPU2, but CPU3's lockup won't be > detected so we don't get the trace that can diagnose the bug. Ah, that makes sense as a benefit if "sysctl_hardlockup_all_cpu_backtrace" is false, which you'd probably want if you had massively SMP systems. ...of course, if you had a lot of cores then I'd imagine the "all-to-all" might become a lot of overhead... Hmmm, but I don't think you really need "all-to-all" checking to get the stacktraces you want, do you? Each CPU can be "watching" exactly one other CPU, but then when we actually lock up we could check all of them and dump stacks on all the ones that are locked up. I think this would be a fairly easy improvement for the buddy system. I'll leave it out for now just to keep things simple for the initial landing, but it wouldn't be hard to add. Then I think the two SMP systems (buddy vs. all-to-all) would be equivalent in terms of functionality? Thinking about this more, I guess you must be assuming coordination between the "SMP" and "non-SMP" checker. Specifically I guess you'd want some guarantee that the "SMP" checker always fires before the non-SMP checker because that would have a higher chance of getting a stack trace without tickling the IPI NMI crash. ...but then once the non-SMP checker printed its own stack trace you'd like the SMP checker running on the same CPU to check to see if any other CPUs were locked up so you could get their stacks as well. With my simplistic solution of just allowing the buddy detector to be enabled in parallel with a perf-based detector then we wouldn't have this level of coordination, but I'll assume that's OK for the initial landing. > Another thing I actually found it useful for is you can easily > see if a core (i.e., all threads in the core) or a chip has > died. Maybe more useful when doing presilicon and bring up work > or firmware hacking, but still useful. I'm probably biased, but for bringup work and stuff like this I usually have kdb/kgdb enabled and then I could get stack traces for whichever CPUs I want. :-P -Doug
Hi, On Mon, May 8, 2023 at 8:52 AM Doug Anderson <dianders@chromium.org> wrote: > > Hmmm, but I don't think you really need "all-to-all" checking to get > the stacktraces you want, do you? Each CPU can be "watching" exactly > one other CPU, but then when we actually lock up we could check all of > them and dump stacks on all the ones that are locked up. I think this > would be a fairly easy improvement for the buddy system. I'll leave it > out for now just to keep things simple for the initial landing, but it > wouldn't be hard to add. Then I think the two SMP systems (buddy vs. > all-to-all) would be equivalent in terms of functionality? FWIW, I take back my "this would be fairly easy" comment. :-P ...or, at least I'll acknowledge that the easy way has some tradeoffs. It wouldn't be trivially easy to just snoop on the data of the other buddies because the watching processors aren't necessarily synchronized with each other. That being said, if someone really wanted to report on other locked CPUs before doing a panic() and was willing to delay the panic, it probably wouldn't be too hard to put in a mode where the CPU that detects the first lockup could do some extra work to look for lockups. Maybe it could send a normal IPI to other CPUs and see if they respond or maybe it could take over monitoring all CPUs and wait one extra period. In any case, I'm not planning on implementing this now, but at least wanted to document thoughts. ;-) > With my simplistic solution > of just allowing the buddy detector to be enabled in parallel with a > perf-based detector then we wouldn't have this level of coordination, > but I'll assume that's OK for the initial landing. I dug into this more as well and I also wanted to note that, at least for now, I'm not going to include support to turn on both the buddy and perf lockup detectors in the common core. In order to do this and not have them stomp on each other then I think we need extra coordination or two copies of the interrupt count / saved interrupt count and, at least at this point in time, it doesn't seem worth it for a halfway solution. From everything I've heard there is a push on many x86 machines to get off the perf lockup detector anyway to free up the resources. Someone could look at adding this complexity later. -Doug
diff --git a/include/linux/nmi.h b/include/linux/nmi.h index 094a0e7ed97d..90aa33317b4c 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -87,8 +87,9 @@ extern unsigned int hardlockup_panic; static inline void hardlockup_detector_disable(void) {} #endif -#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) +#if defined(CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER) void arch_touch_nmi_watchdog(void); +void watchdog_hardlockup_touch_cpu(unsigned int cpu); void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs); #elif !defined(CONFIG_HAVE_NMI_WATCHDOG) static inline void arch_touch_nmi_watchdog(void) { } @@ -118,6 +119,12 @@ void watchdog_hardlockup_disable(unsigned int cpu); void lockup_detector_reconfigure(void); +#ifdef CONFIG_HARDLOCKUP_DETECTOR_BUDDY +void watchdog_buddy_check_hardlockup(unsigned long hrtimer_interrupts); +#else +static inline void watchdog_buddy_check_hardlockup(unsigned long hrtimer_interrupts) {} +#endif + /** * touch_hardlockup_watchdog - manually pet the hardlockup watchdog. * diff --git a/kernel/Makefile b/kernel/Makefile index 406ccccc4dd3..3f4f7975a8b8 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -91,6 +91,7 @@ obj-$(CONFIG_FAIL_FUNCTION) += fail_function.o obj-$(CONFIG_KGDB) += debug/ obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o +obj-$(CONFIG_HARDLOCKUP_DETECTOR_BUDDY) += watchdog_buddy.o obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_perf.o obj-$(CONFIG_SECCOMP) += seccomp.o obj-$(CONFIG_RELAY) += relay.o diff --git a/kernel/watchdog.c b/kernel/watchdog.c index e21896a0a9d5..678d55172ef4 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -85,7 +85,7 @@ __setup("nmi_watchdog=", hardlockup_panic_setup); #endif /* CONFIG_HARDLOCKUP_DETECTOR */ -#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) +#if defined(CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER) static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved); @@ -106,6 +106,14 @@ notrace void arch_touch_nmi_watchdog(void) } EXPORT_SYMBOL(arch_touch_nmi_watchdog); +void watchdog_hardlockup_touch_cpu(unsigned int cpu) +{ + per_cpu(watchdog_hardlockup_touch, cpu) = true; + + /* Match with smp_rmb() in watchdog_hardlockup_check() */ + smp_wmb(); +} + static bool watchdog_hardlockup_is_lockedup(unsigned int cpu) { unsigned long hrint = per_cpu(hrtimer_interrupts, cpu); @@ -118,13 +126,16 @@ static bool watchdog_hardlockup_is_lockedup(unsigned int cpu) return false; } -static void watchdog_hardlockup_interrupt_count(void) +static unsigned long watchdog_hardlockup_interrupt_count(void) { - __this_cpu_inc(hrtimer_interrupts); + return __this_cpu_inc_return(hrtimer_interrupts); } void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) { + /* Match with smp_wmb() in watchdog_hardlockup_touch_cpu() */ + smp_rmb(); + if (__this_cpu_read(watchdog_hardlockup_touch)) { __this_cpu_write(watchdog_hardlockup_touch, false); return; @@ -183,11 +194,11 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) } } -#else /* CONFIG_HARDLOCKUP_DETECTOR_PERF */ +#else /* CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER */ -static inline void watchdog_hardlockup_interrupt_count(void) { } +static inline unsigned long watchdog_hardlockup_interrupt_count(void) { return 0; } -#endif /* !CONFIG_HARDLOCKUP_DETECTOR_PERF */ +#endif /* !CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER */ /* * These functions can be overridden based on the configured hardlockdup detector. @@ -446,12 +457,16 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) struct pt_regs *regs = get_irq_regs(); int duration; int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace; + unsigned long hrtimer_interrupts; if (!watchdog_enabled) return HRTIMER_NORESTART; /* kick the hardlockup detector */ - watchdog_hardlockup_interrupt_count(); + hrtimer_interrupts = watchdog_hardlockup_interrupt_count(); + + /* test for hardlockups */ + watchdog_buddy_check_hardlockup(hrtimer_interrupts); /* kick the softlockup detector */ if (completion_done(this_cpu_ptr(&softlockup_completion))) { diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c new file mode 100644 index 000000000000..fee45af2e5bd --- /dev/null +++ b/kernel/watchdog_buddy.c @@ -0,0 +1,93 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <linux/cpu.h> +#include <linux/cpumask.h> +#include <linux/kernel.h> +#include <linux/nmi.h> +#include <linux/percpu-defs.h> + +static cpumask_t __read_mostly watchdog_cpus; + +static unsigned int watchdog_next_cpu(unsigned int cpu) +{ + cpumask_t cpus = watchdog_cpus; + unsigned int next_cpu; + + next_cpu = cpumask_next(cpu, &cpus); + if (next_cpu >= nr_cpu_ids) + next_cpu = cpumask_first(&cpus); + + if (next_cpu == cpu) + return nr_cpu_ids; + + return next_cpu; +} + +int __init watchdog_hardlockup_probe(void) +{ + return 0; +} + +void watchdog_hardlockup_enable(unsigned int cpu) +{ + unsigned int next_cpu; + + /* + * The new CPU will be marked online before the hrtimer interrupt + * gets a chance to run on it. If another CPU tests for a + * hardlockup on the new CPU before it has run its the hrtimer + * interrupt, it will get a false positive. Touch the watchdog on + * the new CPU to delay the check for at least 3 sampling periods + * to guarantee one hrtimer has run on the new CPU. + */ + watchdog_hardlockup_touch_cpu(cpu); + + /* + * We are going to check the next CPU. Our watchdog_hrtimer + * need not be zero if the CPU has already been online earlier. + * Touch the watchdog on the next CPU to avoid false positive + * if we try to check it in less then 3 interrupts. + */ + next_cpu = watchdog_next_cpu(cpu); + if (next_cpu < nr_cpu_ids) + watchdog_hardlockup_touch_cpu(next_cpu); + + cpumask_set_cpu(cpu, &watchdog_cpus); +} + +void watchdog_hardlockup_disable(unsigned int cpu) +{ + unsigned int next_cpu = watchdog_next_cpu(cpu); + + /* + * Offlining this CPU will cause the CPU before this one to start + * checking the one after this one. If this CPU just finished checking + * the next CPU and updating hrtimer_interrupts_saved, and then the + * previous CPU checks it within one sample period, it will trigger a + * false positive. Touch the watchdog on the next CPU to prevent it. + */ + if (next_cpu < nr_cpu_ids) + watchdog_hardlockup_touch_cpu(next_cpu); + + cpumask_clear_cpu(cpu, &watchdog_cpus); +} + +void watchdog_buddy_check_hardlockup(unsigned long hrtimer_interrupts) +{ + unsigned int next_cpu; + + /* + * Test for hardlockups every 3 samples. The sample period is + * watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly over + * watchdog_thresh (over by 20%). + */ + if (hrtimer_interrupts % 3 != 0) + return; + + /* check for a hardlockup on the next CPU */ + next_cpu = watchdog_next_cpu(smp_processor_id()); + if (next_cpu >= nr_cpu_ids) + return; + + watchdog_hardlockup_check(next_cpu, NULL); +} diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 39d1d93164bd..a80f6b8a21eb 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1025,10 +1025,55 @@ config BOOTPARAM_SOFTLOCKUP_PANIC Say N if unsure. -config HARDLOCKUP_DETECTOR_PERF +# Both the "perf" and "buddy" hardlockup detectors count hrtimer +# interrupts. This config enables functions managing this common code. +config HARDLOCKUP_DETECTOR_COUNTS_HRTIMER bool select SOFTLOCKUP_DETECTOR +config HARDLOCKUP_DETECTOR_PERF + bool + depends on HAVE_HARDLOCKUP_DETECTOR_PERF + select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER + +config HARDLOCKUP_DETECTOR_BUDDY + bool + depends on SMP + select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER + +# For hardlockup detectors you can have one directly provided by the arch +# or use a "non-arch" one. If you're using a "non-arch" one that is +# further divided the perf hardlockup detector (which, confusingly, needs +# arch-provided perf support) and the buddy hardlockup detector (which just +# needs SMP). In either case, using the "non-arch" code conflicts with +# the NMI watchdog code (which is sometimes used directly and sometimes used +# by the arch-provided hardlockup detector). +config HAVE_HARDLOCKUP_DETECTOR_NON_ARCH + bool + depends on (HAVE_HARDLOCKUP_DETECTOR_PERF || SMP) && !HAVE_NMI_WATCHDOG + default y + +config HARDLOCKUP_DETECTOR_PREFER_BUDDY + bool "Prefer the buddy CPU hardlockup detector" + depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH && HAVE_HARDLOCKUP_DETECTOR_PERF && SMP + help + Say Y here to prefer the buddy hardlockup detector over the perf one. + + With the buddy detector, each CPU uses its softlockup hrtimer + to check that the next CPU is processing hrtimer interrupts by + verifying that a counter is increasing. + + This hardlockup detector is useful on systems that don't have + an arch-specific hardlockup detector or if resources needed + for the hardlockup detector are better used for other things. + +# This will select the appropriate non-arch hardlockdup detector +config HARDLOCKUP_DETECTOR_NON_ARCH + bool + depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH + select HARDLOCKUP_DETECTOR_BUDDY if !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY + select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY + # # Enables a timestamp based low pass filter to compensate for perf based # hard lockup detection which runs too fast due to turbo modes. @@ -1043,9 +1088,10 @@ config HARDLOCKUP_CHECK_TIMESTAMP config HARDLOCKUP_DETECTOR bool "Detect Hard Lockups" depends on DEBUG_KERNEL && !S390 - depends on HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_ARCH + depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH || HAVE_HARDLOCKUP_DETECTOR_ARCH select LOCKUP_DETECTOR - select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF + select HARDLOCKUP_DETECTOR_NON_ARCH if HAVE_HARDLOCKUP_DETECTOR_NON_ARCH + help Say Y here to enable the kernel to act as a watchdog to detect hard lockups.