Message ID | 20210409141226.2093980-1-rasmus.villemoes@prevas.dk |
---|---|
State | Superseded |
Delegated to: | Priyanka Jain |
Headers | show |
Series | powerpc, wdt: disable ratelimiting when disabling interrupts | expand |
Le 09/04/2021 à 16:12, Rasmus Villemoes a écrit : > On powerpc, time as measured by get_timer() ceases to pass when > interrupts are disabled (since on powerpc get_timer() returns the > value of a volatile variable that gets updated via a timer > interrupt). That in turn means the watchdog_reset() function provided > by CONFIG_WDT ceases to work due to the ratelimiting it imposes. > > Normally, interrupts are just disabled very briefly. However, during > bootm, they are disabled for good prior to decompressing the kernel > image, which can be a somewhat time-consuming operation. Even when we > manage to decompress the kernel and do the other preparation steps and > hand over control to the kernel, the kernel also takes some time > before it is ready to assume responsibility for handling the > watchdog. The end result is that the board gets reset prematurely. > > The ratelimiting isn't really strictly needed (prior to DM WDT, no > such thing existed), so just disable it when we know that time no > longer passes and have watchdog_reset() (e.g. called from > decompression loop) unconditionally reset the watchdog timer. Do we need to make it that complicated ? I think before the generic implementation, powerpc didn't have a rate limitation at all for pinging the watchdog, why not go back this direction, all the time ? I mean we could simply set reset_period to 0 at all time for powerpc ( and change the test to time_after_eq() instead of time_after() ). > > Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk> > --- > > I previously sent a patch to change the ratelimiting to be based on > get_ticks() instead of get_timer(), but that has gone nowhere > [1]. This is an alternative which only affects powerpc (and only > boards that have enabled CONFIG_WDT). I hope the watchdog maintainers > will accept at least one of these, or suggest a third alternative, so > I don't have to keep some out-of-tree patch applied without knowing if > that's the direction upstream will take. > > [1] https://patchwork.ozlabs.org/project/uboot/patch/20200605111657.28773-1-rasmus.villemoes@prevas.dk/ > >
On 09/04/2021 16.37, Christophe Leroy wrote: > > > Le 09/04/2021 à 16:12, Rasmus Villemoes a écrit : >> The ratelimiting isn't really strictly needed (prior to DM WDT, no >> such thing existed), so just disable it when we know that time no >> longer passes and have watchdog_reset() (e.g. called from >> decompression loop) unconditionally reset the watchdog timer. > > > Do we need to make it that complicated ? I think before the generic > implementation, powerpc didn't have a rate limitation at all for pinging > the watchdog, why not go back this direction, all the time ? > > I mean we could simply set reset_period to 0 at all time for powerpc ( > and change the test to time_after_eq() instead of time_after() ). Maybe. I think I'd still prefer the one that changes the rate-limiting to be based on get_ticks() instead of get_timer(), which did seem to work everywhere. IIRC, Stefan previously rejected having a config knob or board hook for disabling the ratelimiting - changing the code [the s/after/after_eq/] to effectively allow that by passing hw_margin_ms=0 in DT would probably work, but seems quite hacky (especially when DT is supposed to describe hardware). I've also floated the option of making get_timer() return something sensible even with interrupts off on ppc - that would also solve the problem, and would benefit other generic code that uses get_timer() without knowing its limitation on ppc. https://lists.denx.de/pipermail/u-boot/2020-June/414842.html . So, there are lots of ways of solving it. Which direction to take really depends on which viewpoint one has: (1) the watchdog code should, if rate-limiting at all, use an implementation that works on all platforms [the get_ticks option] (2) the watchdog code should, if rate-limiting at all, provide a way for a board/platform to override or disable that (3) get_timer() on ppc is broken, it needs to provide sensible results at all times, even when interrupts are disabled for tens or hundreds of milliseconds. Rasmus
On 12.04.21 09:04, Rasmus Villemoes wrote: > On 09/04/2021 16.37, Christophe Leroy wrote: >> >> >> Le 09/04/2021 à 16:12, Rasmus Villemoes a écrit : > >>> The ratelimiting isn't really strictly needed (prior to DM WDT, no >>> such thing existed), so just disable it when we know that time no >>> longer passes and have watchdog_reset() (e.g. called from >>> decompression loop) unconditionally reset the watchdog timer. >> >> >> Do we need to make it that complicated ? I think before the generic >> implementation, powerpc didn't have a rate limitation at all for pinging >> the watchdog, why not go back this direction, all the time ? >> >> I mean we could simply set reset_period to 0 at all time for powerpc ( >> and change the test to time_after_eq() instead of time_after() ). > > Maybe. I think I'd still prefer the one that changes the rate-limiting > to be based on get_ticks() instead of get_timer(), which did seem to > work everywhere. > > IIRC, Stefan previously rejected having a config knob or board hook for > disabling the ratelimiting - changing the code [the s/after/after_eq/] > to effectively allow that by passing hw_margin_ms=0 in DT would probably > work, but seems quite hacky (especially when DT is supposed to describe > hardware). I'm not sure if I explicitly rejected any patches, sorry it's been too long. ;) But I also find this idea from Christophe quite applealing, as it's impact on the common code is minimal. Either you set the DT property to 0. Or if you don't have the possibility to do this (no DT on your platform?), then add a new Kconfig option to configure the default value for "reset_period" via Kconfig. > I've also floated the option of making get_timer() return something > sensible even with interrupts off on ppc - that would also solve the > problem, and would benefit other generic code that uses get_timer() > without knowing its limitation on ppc. > https://lists.denx.de/pipermail/u-boot/2020-June/414842.html . > > So, there are lots of ways of solving it. Which direction to take really > depends on which viewpoint one has: > > (1) the watchdog code should, if rate-limiting at all, use an > implementation that works on all platforms [the get_ticks option] > (2) the watchdog code should, if rate-limiting at all, provide a way for > a board/platform to override or disable that > (3) get_timer() on ppc is broken, it needs to provide sensible results > at all times, even when interrupts are disabled for tens or hundreds of > milliseconds. "hw_margin_ms=0" would be my favorite way right now (see above). If this does not work for you, I'm also fine with the latest patch you sent to the list. Thanks, Stefan
diff --git a/arch/powerpc/lib/interrupts.c b/arch/powerpc/lib/interrupts.c index 73f270002c..5c5b5fd7ff 100644 --- a/arch/powerpc/lib/interrupts.c +++ b/arch/powerpc/lib/interrupts.c @@ -11,6 +11,7 @@ #include <irq_func.h> #include <asm/processor.h> #include <watchdog.h> +#include <wdt.h> #ifdef CONFIG_LED_STATUS #include <status_led.h> #endif @@ -43,6 +44,7 @@ static __inline__ void set_dec (unsigned long val) void enable_interrupts(void) { set_msr (get_msr () | MSR_EE); + watchdog_ratelimit(1); } /* returns flag if MSR_EE was set before */ @@ -50,6 +52,7 @@ int disable_interrupts(void) { ulong msr = get_msr (); + watchdog_ratelimit(0); set_msr (msr & ~MSR_EE); return ((msr & MSR_EE) != 0); } diff --git a/drivers/watchdog/wdt-uclass.c b/drivers/watchdog/wdt-uclass.c index 0603ffbd36..b70a9d50b8 100644 --- a/drivers/watchdog/wdt-uclass.c +++ b/drivers/watchdog/wdt-uclass.c @@ -131,6 +131,12 @@ int wdt_expire_now(struct udevice *dev, ulong flags) return ret; } +static int ratelimit = 1; +void watchdog_ratelimit(int on) +{ + ratelimit = on; +} + #if defined(CONFIG_WATCHDOG) /* * Called by macro WATCHDOG_RESET. This function be called *very* early, @@ -148,7 +154,7 @@ void watchdog_reset(void) /* Do not reset the watchdog too often */ now = get_timer(0); - if (time_after(now, next_reset)) { + if (!ratelimit || time_after(now, next_reset)) { next_reset = now + reset_period; wdt_reset(gd->watchdog_dev); } diff --git a/include/wdt.h b/include/wdt.h index bc242c2eb2..9ba1e62dcf 100644 --- a/include/wdt.h +++ b/include/wdt.h @@ -107,4 +107,10 @@ struct wdt_ops { int initr_watchdog(void); +#if CONFIG_IS_ENABLED(WDT) +void watchdog_ratelimit(int on); +#else +static inline void watchdog_ratelimit(int on) { } +#endif + #endif /* _WDT_H_ */
On powerpc, time as measured by get_timer() ceases to pass when interrupts are disabled (since on powerpc get_timer() returns the value of a volatile variable that gets updated via a timer interrupt). That in turn means the watchdog_reset() function provided by CONFIG_WDT ceases to work due to the ratelimiting it imposes. Normally, interrupts are just disabled very briefly. However, during bootm, they are disabled for good prior to decompressing the kernel image, which can be a somewhat time-consuming operation. Even when we manage to decompress the kernel and do the other preparation steps and hand over control to the kernel, the kernel also takes some time before it is ready to assume responsibility for handling the watchdog. The end result is that the board gets reset prematurely. The ratelimiting isn't really strictly needed (prior to DM WDT, no such thing existed), so just disable it when we know that time no longer passes and have watchdog_reset() (e.g. called from decompression loop) unconditionally reset the watchdog timer. Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk> --- I previously sent a patch to change the ratelimiting to be based on get_ticks() instead of get_timer(), but that has gone nowhere [1]. This is an alternative which only affects powerpc (and only boards that have enabled CONFIG_WDT). I hope the watchdog maintainers will accept at least one of these, or suggest a third alternative, so I don't have to keep some out-of-tree patch applied without knowing if that's the direction upstream will take. [1] https://patchwork.ozlabs.org/project/uboot/patch/20200605111657.28773-1-rasmus.villemoes@prevas.dk/ arch/powerpc/lib/interrupts.c | 3 +++ drivers/watchdog/wdt-uclass.c | 8 +++++++- include/wdt.h | 6 ++++++ 3 files changed, 16 insertions(+), 1 deletion(-)