Message ID | 20200605111657.28773-1-rasmus.villemoes@prevas.dk |
---|---|
State | RFC |
Delegated to: | Tom Rini |
Headers | show |
Series | [RFC] watchdog: base rate-limiting on get_ticks() rather than get_timer() | expand |
On 05.06.20 13:16, Rasmus Villemoes wrote: > On powerpc, get_timer() is implemented using a volatile variable that > gets incremented from the decrementer interrupt handler. Hence, when > interrupts are disabled, time as measured by get_timer() ceases to > pass. > > Interrupts are necessarily disabled during bootm (see the comment in > bootm_disable_interrupts() for why). But after interrupts are > disabled, there's still lots of work to do - e.g. decompressing the > kernel image to the right load address. Unfortunately, the > rate-limiting logic in wdt_uclass.c's watchdog_reset function means > that WATCHDOG_RESET() becomes a no-op, since get_timer() never > progresses past the next_reset. > > This, combined with an external gpio-triggered watchdog that must be > petted at least every 800ms, means our board gets reset before booting > into linux. > > Now, at least on powerpc, get_ticks() continues to return sensible > results whether or not interrupts are enabled. So this fixes the above > problem for our board - but I don't know if get_ticks() can be assumed > to always work. > > Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk> > --- > This is what I had in mind. I also considered making it a config knob > (possibly just auto-selected based on $ARCH) whether to use > get_timer() or get_ticks(), but that becomes quite ugly. I hesitate a bit, moving with this generic code from get_timer() to get_ticks() for all boards. Did you test this patch on other platforms, like some ARM boards? Please note that I don't reject it - just asking. Thanks, Stefan > drivers/watchdog/wdt-uclass.c | 22 ++++++++++++---------- > 1 file changed, 12 insertions(+), 10 deletions(-) > > diff --git a/drivers/watchdog/wdt-uclass.c b/drivers/watchdog/wdt-uclass.c > index 4cdb7bd64c..7be4e9b5bc 100644 > --- a/drivers/watchdog/wdt-uclass.c > +++ b/drivers/watchdog/wdt-uclass.c > @@ -16,14 +16,15 @@ DECLARE_GLOBAL_DATA_PTR; > > #define WATCHDOG_TIMEOUT_SECS (CONFIG_WATCHDOG_TIMEOUT_MSECS / 1000) > > -/* > - * Reset every 1000ms, or however often is required as indicated by a > - * hw_margin_ms property. > - */ > -static ulong reset_period = 1000; > +static u64 ratelimit_ticks; > > int initr_watchdog(void) > { > + /* > + * Reset every 1000ms, or however often is required as indicated by a > + * hw_margin_ms property. > + */ > + u32 reset_period = 1000; > u32 timeout = WATCHDOG_TIMEOUT_SECS; > > /* > @@ -48,6 +49,7 @@ int initr_watchdog(void) > 4 * reset_period) / 4; > } > > + ratelimit_ticks = usec_to_tick(reset_period * 1000); > wdt_start(gd->watchdog_dev, timeout * 1000, 0); > gd->flags |= GD_FLG_WDT_READY; > printf("WDT: Started with%s servicing (%ds timeout)\n", > @@ -117,17 +119,17 @@ int wdt_expire_now(struct udevice *dev, ulong flags) > */ > void watchdog_reset(void) > { > - static ulong next_reset; > - ulong now; > + static u64 last_reset; > + u64 now; > > /* Exit if GD is not ready or watchdog is not initialized yet */ > if (!gd || !(gd->flags & GD_FLG_WDT_READY)) > return; > > /* Do not reset the watchdog too often */ > - now = get_timer(0); > - if (time_after(now, next_reset)) { > - next_reset = now + reset_period; > + now = get_ticks(); > + if (now - last_reset >= ratelimit_ticks) { > + last_reset = now; > wdt_reset(gd->watchdog_dev); > } > } > Viele Grüße, Stefan
On 05/06/2020 13.48, Stefan Roese wrote: > On 05.06.20 13:16, Rasmus Villemoes wrote: >> This is what I had in mind. I also considered making it a config knob >> (possibly just auto-selected based on $ARCH) whether to use >> get_timer() or get_ticks(), but that becomes quite ugly. > > I hesitate a bit, moving with this generic code from get_timer() > to get_ticks() for all boards. Did you test this patch on other > platforms, like some ARM boards? > > Please note that I don't reject it - just asking. Yeah, I'm not really too happy about it myself, exactly because it affects all arches/platforms. And no, I don't have other hardware handy unfortunately. So it's very much an RFC where I hope someone with knowledge of the various arches can say whether one can expect get_ticks() to be at least as widely available as get_timer(). Rasmus
On 05.06.20 14:11, Rasmus Villemoes wrote: > On 05/06/2020 13.48, Stefan Roese wrote: >> On 05.06.20 13:16, Rasmus Villemoes wrote: >>> This is what I had in mind. I also considered making it a config knob >>> (possibly just auto-selected based on $ARCH) whether to use >>> get_timer() or get_ticks(), but that becomes quite ugly. >> >> I hesitate a bit, moving with this generic code from get_timer() >> to get_ticks() for all boards. Did you test this patch on other >> platforms, like some ARM boards? >> >> Please note that I don't reject it - just asking. > > Yeah, I'm not really too happy about it myself, exactly because it > affects all arches/platforms. And no, I don't have other hardware handy > unfortunately. So it's very much an RFC where I hope someone with > knowledge of the various arches can say whether one can expect > get_ticks() to be at least as widely available as get_timer(). I'll try to test it on a non-powerpc platform soon'ish. Thanks, Stefan
On 05.06.20 14:13, Stefan Roese wrote: > On 05.06.20 14:11, Rasmus Villemoes wrote: >> On 05/06/2020 13.48, Stefan Roese wrote: >>> On 05.06.20 13:16, Rasmus Villemoes wrote: >>>> This is what I had in mind. I also considered making it a config knob >>>> (possibly just auto-selected based on $ARCH) whether to use >>>> get_timer() or get_ticks(), but that becomes quite ugly. >>> >>> I hesitate a bit, moving with this generic code from get_timer() >>> to get_ticks() for all boards. Did you test this patch on other >>> platforms, like some ARM boards? >>> >>> Please note that I don't reject it - just asking. >> >> Yeah, I'm not really too happy about it myself, exactly because it >> affects all arches/platforms. And no, I don't have other hardware handy >> unfortunately. So it's very much an RFC where I hope someone with >> knowledge of the various arches can say whether one can expect >> get_ticks() to be at least as widely available as get_timer(). > > I'll try to test it on a non-powerpc platform soon'ish. I've tested it on an MIPS based platform (gardena-smart-gateway-mt7688) and it works there without any issues AFAICT. Two more remarks: Could you please run a build test with this patch applied for all boards (Travis, Azure...)? And do you see an issue, that the timer-wrap fixed by Chris with this patch [1] will not re-occur with your "ticks" approach? Thanks, Stefan [1] Author: Chris Packham <judge.packham@gmail.com> 2020-02-24 01:20:33 Committer: Stefan Roese <sr@denx.de> 2020-03-16 11:25:12 Parent: db41d65a97f335167e1fbc0400a83333b5157703 (common: Move hang() to the same header as panic()) Child: b4d9452c442769e6dc25649ac02db2d5ed5ea0c8 (watchdog: move initr_watchdog() to wdt-uclass.c) Branches: master, watchdog-ratelimit-2020-06-05-rfc and many more (84) Follows: v2020.04-rc3 Precedes: v2020.04-rc4 watchdog: Handle timer wrap around On some platforms/architectures the value from get_timer() can wrap. This is particularly problematic when long-running code needs to measure a time difference as is the case with watchdog_reset() which tries to avoid tickling the watchdog too frequently. Use time_after() from time.h instead of a plain > comparison to avoid any issues with the time wrapping on a system that has been sitting in u-boot for a long time. Signed-off-by: Chris Packham <judge.packham@gmail.com> Reviewed-by: Stefan Roese <sr@denx.de>
On 05/06/2020 15.37, Stefan Roese wrote: > On 05.06.20 14:13, Stefan Roese wrote: >> On 05.06.20 14:11, Rasmus Villemoes wrote: >>> On 05/06/2020 13.48, Stefan Roese wrote: >>>> On 05.06.20 13:16, Rasmus Villemoes wrote: >>>>> This is what I had in mind. I also considered making it a config knob >>>>> (possibly just auto-selected based on $ARCH) whether to use >>>>> get_timer() or get_ticks(), but that becomes quite ugly. >>>> >>>> I hesitate a bit, moving with this generic code from get_timer() >>>> to get_ticks() for all boards. Did you test this patch on other >>>> platforms, like some ARM boards? >>>> >>>> Please note that I don't reject it - just asking. >>> >>> Yeah, I'm not really too happy about it myself, exactly because it >>> affects all arches/platforms. And no, I don't have other hardware handy >>> unfortunately. So it's very much an RFC where I hope someone with >>> knowledge of the various arches can say whether one can expect >>> get_ticks() to be at least as widely available as get_timer(). >> >> I'll try to test it on a non-powerpc platform soon'ish. > > I've tested it on an MIPS based platform (gardena-smart-gateway-mt7688) > and it works there without any issues AFAICT. Thanks. > Two more remarks: > > Could you please run a build test with this patch applied for all > boards (Travis, Azure...)? I may need a few more pointers than that. What am I supposed to do exactly? > And do you see an issue, that the timer-wrap fixed by Chris with this > patch [1] will not re-occur with your "ticks" approach? No, it will not re-occur, because this does the comparison the right way, namely by subtracting the absolute times and comparing that delta to the threshold (that's also what the time_after does under the hood). The wrong way is saying now >= last_reset + threshold (or as I think the code used to do, computing next_reset = now + threshold and then asking now >= next_reset). Rasmus
On 05.06.20 16:18, Rasmus Villemoes wrote: > On 05/06/2020 15.37, Stefan Roese wrote: >> On 05.06.20 14:13, Stefan Roese wrote: >>> On 05.06.20 14:11, Rasmus Villemoes wrote: >>>> On 05/06/2020 13.48, Stefan Roese wrote: >>>>> On 05.06.20 13:16, Rasmus Villemoes wrote: >>>>>> This is what I had in mind. I also considered making it a config knob >>>>>> (possibly just auto-selected based on $ARCH) whether to use >>>>>> get_timer() or get_ticks(), but that becomes quite ugly. >>>>> >>>>> I hesitate a bit, moving with this generic code from get_timer() >>>>> to get_ticks() for all boards. Did you test this patch on other >>>>> platforms, like some ARM boards? >>>>> >>>>> Please note that I don't reject it - just asking. >>>> >>>> Yeah, I'm not really too happy about it myself, exactly because it >>>> affects all arches/platforms. And no, I don't have other hardware handy >>>> unfortunately. So it's very much an RFC where I hope someone with >>>> knowledge of the various arches can say whether one can expect >>>> get_ticks() to be at least as widely available as get_timer(). >>> >>> I'll try to test it on a non-powerpc platform soon'ish. >> >> I've tested it on an MIPS based platform (gardena-smart-gateway-mt7688) >> and it works there without any issues AFAICT. > > Thanks. > >> Two more remarks: >> >> Could you please run a build test with this patch applied for all >> boards (Travis, Azure...)? > > I may need a few more pointers than that. What am I supposed to do exactly? Not sure if there is some documentation on how to use the Travis, Gitlab or Azure build. It mainly uses buildman [1] internally, so you can use buildman locally if you don't want to push the build into the cloud. >> And do you see an issue, that the timer-wrap fixed by Chris with this >> patch [1] will not re-occur with your "ticks" approach? > > No, it will not re-occur, because this does the comparison the right > way, namely by subtracting the absolute times and comparing that delta > to the threshold (that's also what the time_after does under the hood). > The wrong way is saying now >= last_reset + threshold (or as I think the > code used to do, computing next_reset = now + threshold and then asking > now >= next_reset). Ah, good to know. Thanks for confirming. Thanks, Stefan [1] tools/buildman/README
On Fri, Jun 05, 2020 at 04:23:21PM +0200, Stefan Roese wrote: > On 05.06.20 16:18, Rasmus Villemoes wrote: > > On 05/06/2020 15.37, Stefan Roese wrote: > > > On 05.06.20 14:13, Stefan Roese wrote: > > > > On 05.06.20 14:11, Rasmus Villemoes wrote: > > > > > On 05/06/2020 13.48, Stefan Roese wrote: > > > > > > On 05.06.20 13:16, Rasmus Villemoes wrote: > > > > > > > This is what I had in mind. I also considered making it a config knob > > > > > > > (possibly just auto-selected based on $ARCH) whether to use > > > > > > > get_timer() or get_ticks(), but that becomes quite ugly. > > > > > > > > > > > > I hesitate a bit, moving with this generic code from get_timer() > > > > > > to get_ticks() for all boards. Did you test this patch on other > > > > > > platforms, like some ARM boards? > > > > > > > > > > > > Please note that I don't reject it - just asking. > > > > > > > > > > Yeah, I'm not really too happy about it myself, exactly because it > > > > > affects all arches/platforms. And no, I don't have other hardware handy > > > > > unfortunately. So it's very much an RFC where I hope someone with > > > > > knowledge of the various arches can say whether one can expect > > > > > get_ticks() to be at least as widely available as get_timer(). > > > > > > > > I'll try to test it on a non-powerpc platform soon'ish. > > > > > > I've tested it on an MIPS based platform (gardena-smart-gateway-mt7688) > > > and it works there without any issues AFAICT. > > > > Thanks. > > > > > Two more remarks: > > > > > > Could you please run a build test with this patch applied for all > > > boards (Travis, Azure...)? > > > > I may need a few more pointers than that. What am I supposed to do exactly? > > Not sure if there is some documentation on how to use the Travis, > Gitlab or Azure build. It mainly uses buildman [1] internally, so you > can use buildman locally if you don't want to push the build into > the cloud. So, here's a recently-I-learned. If you submit a pull request on GitHub to https://github.com/u-boot/u-boot that will trigger Travis and Azure runs: https://travis-ci.org/github/u-boot/u-boot/pull_requests and somewhere under: https://dev.azure.com/u-boot/u-boot/_build?definitionId=2&_a=summary Note that sometimes Travis will have connectivity failures and you have to be me to tell it to re-run the job. Azure only has failures when we get those to-be-debugged random failures of sandbox/qemu targets.
On 05/06/2020 16.34, Tom Rini wrote: > On Fri, Jun 05, 2020 at 04:23:21PM +0200, Stefan Roese wrote: >> On 05.06.20 16:18, Rasmus Villemoes wrote: >>> On 05/06/2020 15.37, Stefan Roese wrote: >>>> Could you please run a build test with this patch applied for all >>>> boards (Travis, Azure...)? >>> >>> I may need a few more pointers than that. What am I supposed to do exactly? >> >> Not sure if there is some documentation on how to use the Travis, >> Gitlab or Azure build. It mainly uses buildman [1] internally, so you >> can use buildman locally if you don't want to push the build into >> the cloud. > > So, here's a recently-I-learned. If you submit a pull request on GitHub > to https://github.com/u-boot/u-boot that will trigger Travis and Azure > runs: > https://travis-ci.org/github/u-boot/u-boot/pull_requests > and somewhere under: > https://dev.azure.com/u-boot/u-boot/_build?definitionId=2&_a=summary > > Note that sometimes Travis will have connectivity failures and you have > to be me to tell it to re-run the job. Azure only has failures when we > get those to-be-debugged random failures of sandbox/qemu targets. > Thanks! FWIW, https://travis-ci.org/github/u-boot/u-boot/builds/695074829 and https://dev.azure.com/u-boot/u-boot/_build/results?buildId=713&view=results are all green - but how many of those actually build wdt-uclass.c I don't know. Rasmus
diff --git a/drivers/watchdog/wdt-uclass.c b/drivers/watchdog/wdt-uclass.c index 4cdb7bd64c..7be4e9b5bc 100644 --- a/drivers/watchdog/wdt-uclass.c +++ b/drivers/watchdog/wdt-uclass.c @@ -16,14 +16,15 @@ DECLARE_GLOBAL_DATA_PTR; #define WATCHDOG_TIMEOUT_SECS (CONFIG_WATCHDOG_TIMEOUT_MSECS / 1000) -/* - * Reset every 1000ms, or however often is required as indicated by a - * hw_margin_ms property. - */ -static ulong reset_period = 1000; +static u64 ratelimit_ticks; int initr_watchdog(void) { + /* + * Reset every 1000ms, or however often is required as indicated by a + * hw_margin_ms property. + */ + u32 reset_period = 1000; u32 timeout = WATCHDOG_TIMEOUT_SECS; /* @@ -48,6 +49,7 @@ int initr_watchdog(void) 4 * reset_period) / 4; } + ratelimit_ticks = usec_to_tick(reset_period * 1000); wdt_start(gd->watchdog_dev, timeout * 1000, 0); gd->flags |= GD_FLG_WDT_READY; printf("WDT: Started with%s servicing (%ds timeout)\n", @@ -117,17 +119,17 @@ int wdt_expire_now(struct udevice *dev, ulong flags) */ void watchdog_reset(void) { - static ulong next_reset; - ulong now; + static u64 last_reset; + u64 now; /* Exit if GD is not ready or watchdog is not initialized yet */ if (!gd || !(gd->flags & GD_FLG_WDT_READY)) return; /* Do not reset the watchdog too often */ - now = get_timer(0); - if (time_after(now, next_reset)) { - next_reset = now + reset_period; + now = get_ticks(); + if (now - last_reset >= ratelimit_ticks) { + last_reset = now; wdt_reset(gd->watchdog_dev); } }
On powerpc, get_timer() is implemented using a volatile variable that gets incremented from the decrementer interrupt handler. Hence, when interrupts are disabled, time as measured by get_timer() ceases to pass. Interrupts are necessarily disabled during bootm (see the comment in bootm_disable_interrupts() for why). But after interrupts are disabled, there's still lots of work to do - e.g. decompressing the kernel image to the right load address. Unfortunately, the rate-limiting logic in wdt_uclass.c's watchdog_reset function means that WATCHDOG_RESET() becomes a no-op, since get_timer() never progresses past the next_reset. This, combined with an external gpio-triggered watchdog that must be petted at least every 800ms, means our board gets reset before booting into linux. Now, at least on powerpc, get_ticks() continues to return sensible results whether or not interrupts are enabled. So this fixes the above problem for our board - but I don't know if get_ticks() can be assumed to always work. Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk> --- This is what I had in mind. I also considered making it a config knob (possibly just auto-selected based on $ARCH) whether to use get_timer() or get_ticks(), but that becomes quite ugly. drivers/watchdog/wdt-uclass.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-)