Message ID | 20200313160458.19727-1-rasmus.villemoes@prevas.dk |
---|---|
Headers | show |
Series | watchdog: honour hw_margin_ms property | expand |
On 13.03.20 17:04, Rasmus Villemoes wrote: > Some watchdogs must be reset more often than the once-per-second > ratelimit used by the generic watchdog_reset function in > wdt-uclass.c. There's precedent (from the gpio-wdt driver in linux) > for using a property called hw_margin_ms to let the device tree tell > the driver how often the device needs resetting. So use that > generically. No change in default behaviour. > > On top of https://patchwork.ozlabs.org/patch/1242772/ . > > Stefan, something like this? Yes, thanks for looking into this. > That at least solves half my problems and > might be useful to others as well. Then I'll have to figure out the > time-stands-still problem in some other way. If its too hard to enable interrupts in SPL for you or to provide some other means of a working get_timer() API, then we needto find another solution. You started with this weak function, which of course works. What other options are there? Adding a callback mechanism to register platform specific callback functions? Even though this might get a little bit too complicated. Thanks, Stefan
On 14/03/2020 13.04, Stefan Roese wrote: > On 13.03.20 17:04, Rasmus Villemoes wrote: >> That at least solves half my problems and >> might be useful to others as well. Then I'll have to figure out the >> time-stands-still problem in some other way. > > If its too hard to enable interrupts in SPL for you or to provide some > other means of a working get_timer() API, then we needto find another > solution. You started with this weak function, which of course works. > What other options are there? Adding a callback mechanism to register > platform specific callback functions? Even though this might get a > little bit too complicated. Now that I dig a bit more into this, I seem to remember that we actually also had problems in U-Boot proper when loading a compressed kernel, so for now we're using an uncompressed kernel in our FIT images. I will have to re-investigate, but it now occurs to me that it might be due to the fact that interrupts get disabled during bootm (which makes sense, the same reason I stated previously of interrupt vectors about to be overwritten), so even in U-Boot proper, time as measured by get_timer() ceases to pass after that point, so all the WATCHDOG_RESET() calls from the inflate code effectively get ignored. So it may be necessary to have some wdt_ratelimit_disable() hook that can be called from bootm_disable_interrupts() and e.g. some board-specific SPL code. I'll do some experiments and figure out if I do indeed need such a hook. Rasmus
On 13.03.20 17:04, Rasmus Villemoes wrote: > Some watchdogs must be reset more often than the once-per-second > ratelimit used by the generic watchdog_reset function in > wdt-uclass.c. There's precedent (from the gpio-wdt driver in linux) > for using a property called hw_margin_ms to let the device tree tell > the driver how often the device needs resetting. So use that > generically. No change in default behaviour. > > On top of https://patchwork.ozlabs.org/patch/1242772/ . > > Stefan, something like this? That at least solves half my problems and > might be useful to others as well. Then I'll have to figure out the > time-stands-still problem in some other way. > > Rasmus Villemoes (3): > watchdog: remove stale ifndef CONFIG_WATCHDOG_TIMEOUT_MSECS from wdt.h > watchdog: move initr_watchdog() to wdt-uclass.c > watchdog: honour hw_margin_ms DT property > > drivers/watchdog/wdt-uclass.c | 43 ++++++++++++++++++++++++++++++++++- > include/wdt.h | 38 +------------------------------ > 2 files changed, 43 insertions(+), 38 deletions(-) > Series applied to u-boot-marvell/master Thanks, Stefan
On 16/03/2020 16.52, Rasmus Villemoes wrote: > On 14/03/2020 13.04, Stefan Roese wrote: >> On 13.03.20 17:04, Rasmus Villemoes wrote: > >>> That at least solves half my problems and >>> might be useful to others as well. Then I'll have to figure out the >>> time-stands-still problem in some other way. >> >> If its too hard to enable interrupts in SPL for you or to provide some >> other means of a working get_timer() API, then we needto find another >> solution. You started with this weak function, which of course works. >> What other options are there? Adding a callback mechanism to register >> platform specific callback functions? Even though this might get a >> little bit too complicated. > > Now that I dig a bit more into this, I seem to remember that we actually > also had problems in U-Boot proper when loading a compressed kernel, so > for now we're using an uncompressed kernel in our FIT images. I will > have to re-investigate, but it now occurs to me that it might be due to > the fact that interrupts get disabled during bootm (which makes sense, > the same reason I stated previously of interrupt vectors about to be > overwritten), so even in U-Boot proper, time as measured by get_timer() > ceases to pass after that point, so all the WATCHDOG_RESET() calls from > the inflate code effectively get ignored. > > So it may be necessary to have some wdt_ratelimit_disable() hook that > can be called from bootm_disable_interrupts() and e.g. some > board-specific SPL code. I'll do some experiments and figure out if I do > indeed need such a hook. OK, I have now had time to do some more experiments. I have enabled the timer tick in SPL, so get_timer() now "normally" works. Together with the .dts based read of the hardware margin, that makes the watchdog handling mostly work. But, as I suspected, I do have a problem when loading a compressed kernel image - what I write above "so even in U-Boot proper, time as measured by get_timer() ceases to pass after that point, so all the WATCHDOG_RESET() calls from the inflate code effectively get ignored." is indeed the case. So, what's the best way to proceed? Should there be a hook disabling the rate-limiting logic that bootm_disable_interrupts() can call? Or must get_timer() always return a sensible result even with interrupts disabled? Rasmus
On 02.06.20 15:29, Rasmus Villemoes wrote: > On 16/03/2020 16.52, Rasmus Villemoes wrote: >> On 14/03/2020 13.04, Stefan Roese wrote: >>> On 13.03.20 17:04, Rasmus Villemoes wrote: >> >>>> That at least solves half my problems and >>>> might be useful to others as well. Then I'll have to figure out the >>>> time-stands-still problem in some other way. >>> >>> If its too hard to enable interrupts in SPL for you or to provide some >>> other means of a working get_timer() API, then we needto find another >>> solution. You started with this weak function, which of course works. >>> What other options are there? Adding a callback mechanism to register >>> platform specific callback functions? Even though this might get a >>> little bit too complicated. >> >> Now that I dig a bit more into this, I seem to remember that we actually >> also had problems in U-Boot proper when loading a compressed kernel, so >> for now we're using an uncompressed kernel in our FIT images. I will >> have to re-investigate, but it now occurs to me that it might be due to >> the fact that interrupts get disabled during bootm (which makes sense, >> the same reason I stated previously of interrupt vectors about to be >> overwritten), so even in U-Boot proper, time as measured by get_timer() >> ceases to pass after that point, so all the WATCHDOG_RESET() calls from >> the inflate code effectively get ignored. >> >> So it may be necessary to have some wdt_ratelimit_disable() hook that >> can be called from bootm_disable_interrupts() and e.g. some >> board-specific SPL code. I'll do some experiments and figure out if I do >> indeed need such a hook. > > OK, I have now had time to do some more experiments. I have enabled the > timer tick in SPL, so get_timer() now "normally" works. Together with > the .dts based read of the hardware margin, that makes the watchdog > handling mostly work. > > But, as I suspected, I do have a problem when loading a compressed > kernel image - what I write above "so even in U-Boot proper, time as > measured by get_timer() ceases to pass after that point, so all the > WATCHDOG_RESET() calls from the inflate code effectively get ignored." > is indeed the case. > > So, what's the best way to proceed? Should there be a hook disabling the > rate-limiting logic that bootm_disable_interrupts() can call? Or must > get_timer() always return a sensible result even with interrupts disabled? Wouldn't it make sense to move the bootm_disable_interrupts() call to after loading and uncompressing the OS image? To right before jumping to the OS? Thanks, Stefan
On 02/06/2020 16.53, Stefan Roese wrote: > On 02.06.20 15:29, Rasmus Villemoes wrote: >> On 16/03/2020 16.52, Rasmus Villemoes wrote: >>> On 14/03/2020 13.04, Stefan Roese wrote: >>>> On 13.03.20 17:04, Rasmus Villemoes wrote: >>> >>>>> That at least solves half my problems and >>>>> might be useful to others as well. Then I'll have to figure out the >>>>> time-stands-still problem in some other way. >>>> >>>> If its too hard to enable interrupts in SPL for you or to provide some >>>> other means of a working get_timer() API, then we needto find another >>>> solution. You started with this weak function, which of course works. >>>> What other options are there? Adding a callback mechanism to register >>>> platform specific callback functions? Even though this might get a >>>> little bit too complicated. >>> >>> Now that I dig a bit more into this, I seem to remember that we actually >>> also had problems in U-Boot proper when loading a compressed kernel, so >>> for now we're using an uncompressed kernel in our FIT images. I will >>> have to re-investigate, but it now occurs to me that it might be due to >>> the fact that interrupts get disabled during bootm (which makes sense, >>> the same reason I stated previously of interrupt vectors about to be >>> overwritten), so even in U-Boot proper, time as measured by get_timer() >>> ceases to pass after that point, so all the WATCHDOG_RESET() calls from >>> the inflate code effectively get ignored. >>> >>> So it may be necessary to have some wdt_ratelimit_disable() hook that >>> can be called from bootm_disable_interrupts() and e.g. some >>> board-specific SPL code. I'll do some experiments and figure out if I do >>> indeed need such a hook. >> >> OK, I have now had time to do some more experiments. I have enabled the >> timer tick in SPL, so get_timer() now "normally" works. Together with >> the .dts based read of the hardware margin, that makes the watchdog >> handling mostly work. >> >> But, as I suspected, I do have a problem when loading a compressed >> kernel image - what I write above "so even in U-Boot proper, time as >> measured by get_timer() ceases to pass after that point, so all the >> WATCHDOG_RESET() calls from the inflate code effectively get ignored." >> is indeed the case. >> >> So, what's the best way to proceed? Should there be a hook disabling the >> rate-limiting logic that bootm_disable_interrupts() can call? Or must >> get_timer() always return a sensible result even with interrupts >> disabled? > > Wouldn't it make sense to move the bootm_disable_interrupts() call to > after loading and uncompressing the OS image? To right before jumping > to the OS? No, because the point of disabling interrupts is that we may start writing to physical address 0 (e.g. if that's the load= address in the FIT image), which is also where the interrupt vectors reside - i.e., we're about to overwrite 0x900 (the decrementer interrupt vector), so if we don't disable interrupts, we'll crash on the very next decrementer interrupt (i.e., within one millisecond). Rasmus
On 02/06/2020 17.38, Rasmus Villemoes wrote: > On 02/06/2020 16.53, Stefan Roese wrote: >> On 02.06.20 15:29, Rasmus Villemoes wrote: >>> On 16/03/2020 16.52, Rasmus Villemoes wrote: >>>> On 14/03/2020 13.04, Stefan Roese wrote: >>>>> On 13.03.20 17:04, Rasmus Villemoes wrote: >>>> >>> But, as I suspected, I do have a problem when loading a compressed >>> kernel image - what I write above "so even in U-Boot proper, time as >>> measured by get_timer() ceases to pass after that point, so all the >>> WATCHDOG_RESET() calls from the inflate code effectively get ignored." >>> is indeed the case. >>> >>> So, what's the best way to proceed? Should there be a hook disabling the >>> rate-limiting logic that bootm_disable_interrupts() can call? Or must >>> get_timer() always return a sensible result even with interrupts >>> disabled? >> >> Wouldn't it make sense to move the bootm_disable_interrupts() call to >> after loading and uncompressing the OS image? To right before jumping >> to the OS? > > No, because the point of disabling interrupts is that we may start > writing to physical address 0 (e.g. if that's the load= address in the > FIT image), which is also where the interrupt vectors reside - i.e., > we're about to overwrite 0x900 (the decrementer interrupt vector), so if > we don't disable interrupts, we'll crash on the very next decrementer > interrupt (i.e., within one millisecond). FWIW, the below very hacky patch makes get_timer() return sensible results on ppc even when interrupts are disabled, and hence ensures that the watchdog does get petted. It's obviously not meant for inclusion as is (it's prepared for being a proper config option, but for toying around it's easier to have it all in one file - also, I don't really like the name of the config knob). But it's also kind of expensive to do that do_div(), so I'm not sure I think this is even the right approach. You previously rejected allowing the board to provide an override for the rate-limiting, and at least the "hw_margin_ms" parsing now solves part of what I wanted to use that for. What about implementing the rate-limiting instead in terms of get_ticks() (the hw_margin_ms can trivially be translated to a number of ticks at init time - there's already a usec_to_tick helper)? Are there any boards where get_ticks() doesn't return something sensible? Rasmus _Not_ for inclusion: diff --git a/arch/powerpc/lib/interrupts.c b/arch/powerpc/lib/interrupts.c index 23ac5bca1e..e6b6a967ae 100644 --- a/arch/powerpc/lib/interrupts.c +++ b/arch/powerpc/lib/interrupts.c @@ -10,11 +10,14 @@ #include <common.h> #include <irq_func.h> #include <asm/processor.h> +#include <div64.h> #include <watchdog.h> #ifdef CONFIG_LED_STATUS #include <status_led.h> #endif +#define CONFIG_GET_TIMER_IRQ 1 + #ifndef CONFIG_MPC83XX_TIMER #ifndef CONFIG_SYS_WATCHDOG_FREQ #define CONFIG_SYS_WATCHDOG_FREQ (CONFIG_SYS_HZ / 2) @@ -39,8 +42,33 @@ static __inline__ void set_dec (unsigned long val) } #endif /* !CONFIG_MPC83XX_TIMER */ +static u64 irq_off_ticks = 0; +static int interrupts_enabled = 0; +static volatile ulong timestamp = 0; + +static u32 irq_off_msecs(void) +{ + u64 t; + u32 d = get_tbclk(); + + if (!d) + return 0; + t = get_ticks() - irq_off_ticks; + t *= 1000; + do_div(t, d); + return t; +} + void enable_interrupts(void) { + ulong msr = get_msr (); + + if (IS_ENABLED(CONFIG_GET_TIMER_IRQ) && !(msr & MSR_EE)) { + /* Account for the time interrupts were off. */ + timestamp += irq_off_msecs(); + interrupts_enabled = 1; + } + set_msr (get_msr () | MSR_EE); } @@ -50,6 +78,13 @@ int disable_interrupts(void) ulong msr = get_msr (); set_msr (msr & ~MSR_EE); + + if (IS_ENABLED(CONFIG_GET_TIMER_IRQ) && (msr & MSR_EE)) { + /* Record when interrupts were disabled. */ + irq_off_ticks = get_ticks(); + interrupts_enabled = 0; + } + return ((msr & MSR_EE) != 0); } @@ -61,13 +96,11 @@ int interrupt_init(void) set_dec (decrementer_count); - set_msr (get_msr () | MSR_EE); + enable_interrupts(); return (0); } -static volatile ulong timestamp = 0; - void timer_interrupt(struct pt_regs *regs) { /* call cpu specific function from $(CPU)/interrupts.c */ @@ -90,6 +123,12 @@ void timer_interrupt(struct pt_regs *regs) ulong get_timer (ulong base) { - return (timestamp - base); + ulong ts = timestamp; + + /* If called within an irq-off section, account for the time since irqs were turned off. */ + if (IS_ENABLED(CONFIG_GET_TIMER_IRQ) && !interrupts_enabled) + ts += irq_off_msecs(); + + return (ts - base); } #endif /* !CONFIG_MPC83XX_TIMER */
On 04.06.20 10:28, Rasmus Villemoes wrote: > On 02/06/2020 17.38, Rasmus Villemoes wrote: >> On 02/06/2020 16.53, Stefan Roese wrote: >>> On 02.06.20 15:29, Rasmus Villemoes wrote: >>>> On 16/03/2020 16.52, Rasmus Villemoes wrote: >>>>> On 14/03/2020 13.04, Stefan Roese wrote: >>>>>> On 13.03.20 17:04, Rasmus Villemoes wrote: >>>>> > >>>> But, as I suspected, I do have a problem when loading a compressed >>>> kernel image - what I write above "so even in U-Boot proper, time as >>>> measured by get_timer() ceases to pass after that point, so all the >>>> WATCHDOG_RESET() calls from the inflate code effectively get ignored." >>>> is indeed the case. >>>> >>>> So, what's the best way to proceed? Should there be a hook disabling the >>>> rate-limiting logic that bootm_disable_interrupts() can call? Or must >>>> get_timer() always return a sensible result even with interrupts >>>> disabled? >>> >>> Wouldn't it make sense to move the bootm_disable_interrupts() call to >>> after loading and uncompressing the OS image? To right before jumping >>> to the OS? >> >> No, because the point of disabling interrupts is that we may start >> writing to physical address 0 (e.g. if that's the load= address in the >> FIT image), which is also where the interrupt vectors reside - i.e., >> we're about to overwrite 0x900 (the decrementer interrupt vector), so if >> we don't disable interrupts, we'll crash on the very next decrementer >> interrupt (i.e., within one millisecond). Ah, thanks for refreshing me on this. > FWIW, the below very hacky patch makes get_timer() return sensible > results on ppc even when interrupts are disabled, and hence ensures that > the watchdog does get petted. It's obviously not meant for inclusion as > is (it's prepared for being a proper config option, but for toying > around it's easier to have it all in one file - also, I don't really > like the name of the config knob). But it's also kind of expensive to do > that do_div(), so I'm not sure I think this is even the right approach. I agree. This does not look as its going into the right direction. But thanks for working on this anyways. > You previously rejected allowing the board to provide an override for > the rate-limiting, From my memory, I "just" suggested working on a different, more generic approach. But if this fails, I'm open to re-visit the options. > and at least the "hw_margin_ms" parsing now solves > part of what I wanted to use that for. What about implementing the > rate-limiting instead in terms of get_ticks() (the hw_margin_ms can > trivially be translated to a number of ticks at init time - there's > already a usec_to_tick helper)? Are there any boards where get_ticks() > doesn't return something sensible? Could you please send a new version of a patch(set) to address these issues by using the "override for the rate-limiting" or some other idea you have right now for this? I'll review it soon'ish. Thanks, Stefan > Rasmus > > _Not_ for inclusion: > > diff --git a/arch/powerpc/lib/interrupts.c b/arch/powerpc/lib/interrupts.c > index 23ac5bca1e..e6b6a967ae 100644 > --- a/arch/powerpc/lib/interrupts.c > +++ b/arch/powerpc/lib/interrupts.c > @@ -10,11 +10,14 @@ > #include <common.h> > #include <irq_func.h> > #include <asm/processor.h> > +#include <div64.h> > #include <watchdog.h> > #ifdef CONFIG_LED_STATUS > #include <status_led.h> > #endif > > +#define CONFIG_GET_TIMER_IRQ 1 > + > #ifndef CONFIG_MPC83XX_TIMER > #ifndef CONFIG_SYS_WATCHDOG_FREQ > #define CONFIG_SYS_WATCHDOG_FREQ (CONFIG_SYS_HZ / 2) > @@ -39,8 +42,33 @@ static __inline__ void set_dec (unsigned long val) > } > #endif /* !CONFIG_MPC83XX_TIMER */ > > +static u64 irq_off_ticks = 0; > +static int interrupts_enabled = 0; > +static volatile ulong timestamp = 0; > + > +static u32 irq_off_msecs(void) > +{ > + u64 t; > + u32 d = get_tbclk(); > + > + if (!d) > + return 0; > + t = get_ticks() - irq_off_ticks; > + t *= 1000; > + do_div(t, d); > + return t; > +} > + > void enable_interrupts(void) > { > + ulong msr = get_msr (); > + > + if (IS_ENABLED(CONFIG_GET_TIMER_IRQ) && !(msr & MSR_EE)) { > + /* Account for the time interrupts were off. */ > + timestamp += irq_off_msecs(); > + interrupts_enabled = 1; > + } > + > set_msr (get_msr () | MSR_EE); > } > > @@ -50,6 +78,13 @@ int disable_interrupts(void) > ulong msr = get_msr (); > > set_msr (msr & ~MSR_EE); > + > + if (IS_ENABLED(CONFIG_GET_TIMER_IRQ) && (msr & MSR_EE)) { > + /* Record when interrupts were disabled. */ > + irq_off_ticks = get_ticks(); > + interrupts_enabled = 0; > + } > + > return ((msr & MSR_EE) != 0); > } > > @@ -61,13 +96,11 @@ int interrupt_init(void) > > set_dec (decrementer_count); > > - set_msr (get_msr () | MSR_EE); > + enable_interrupts(); > > return (0); > } > > -static volatile ulong timestamp = 0; > - > void timer_interrupt(struct pt_regs *regs) > { > /* call cpu specific function from $(CPU)/interrupts.c */ > @@ -90,6 +123,12 @@ void timer_interrupt(struct pt_regs *regs) > > ulong get_timer (ulong base) > { > - return (timestamp - base); > + ulong ts = timestamp; > + > + /* If called within an irq-off section, account for the time since > irqs were turned off. */ > + if (IS_ENABLED(CONFIG_GET_TIMER_IRQ) && !interrupts_enabled) > + ts += irq_off_msecs(); > + > + return (ts - base); > } > #endif /* !CONFIG_MPC83XX_TIMER */ > > > Viele Grüße, Stefan