Message ID | 20210413193827.2632540-3-rasmus.villemoes@prevas.dk |
---|---|
State | Superseded |
Delegated to: | Stefan Roese |
Headers | show |
Series | powerpc: introduce CONFIG_CACHE_FLUSH_WATCHDOG_THRESHOLD | expand |
On 13.04.21 21:38, Rasmus Villemoes wrote: > When flush_cache() is called during boot on our ~7M kernel image, the > hundreds of thousands of WATCHDOG_RESET calls end up adding > significantly to boottime. Flushing a single cache line doesn't take > many microseconds, so doing these calls for every cache line is > complete overkill. > > The generic watchdog_reset() provided by wdt-uclass.c actually > contains some rate-limiting logic that should in theory mitigate this, > but alas, that rate-limiting must be disabled on powerpc because of > its get_timer() implementation - get_timer() works just fine until > interrupts are disabled, but it just so happens that the "big" > flush_cache() call happens in the part of bootm where interrupts are > indeed disabled. [1] [2] [3] > > I have checked with objdump that the generated code doesn't change > when this option is left at its default value of 0: gcc is smart > enough to see that wd_limit is constant 0, the ">=" comparisons are > tautologically true, hence all assignments to "flushed" are eliminated Nitpicking: s/tautologically/autologically. BTW, I've never heard or read this word before. > as dead stores. > > On our board, setting the option to something like 65536 ends up > reducing total boottime by about 0.8 seconds. > > [1] https://patchwork.ozlabs.org/project/uboot/patch/20200605111657.28773-1-rasmus.villemoes@prevas.dk/ > [2] https://lists.denx.de/pipermail/u-boot/2021-April/446906.html > [3] https://lists.denx.de/pipermail/u-boot/2021-April/447280.html > > Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk> > --- > arch/powerpc/Kconfig | 1 + > arch/powerpc/lib/Kconfig | 9 +++++++++ > arch/powerpc/lib/cache.c | 14 ++++++++++++-- > 3 files changed, 22 insertions(+), 2 deletions(-) > create mode 100644 arch/powerpc/lib/Kconfig > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > index 6a2e88fed2..133447648c 100644 > --- a/arch/powerpc/Kconfig > +++ b/arch/powerpc/Kconfig > @@ -49,5 +49,6 @@ source "arch/powerpc/cpu/mpc83xx/Kconfig" > source "arch/powerpc/cpu/mpc85xx/Kconfig" > source "arch/powerpc/cpu/mpc86xx/Kconfig" > source "arch/powerpc/cpu/mpc8xx/Kconfig" > +source "arch/powerpc/lib/Kconfig" > > endmenu > diff --git a/arch/powerpc/lib/Kconfig b/arch/powerpc/lib/Kconfig > new file mode 100644 > index 0000000000..b30b5edf7c > --- /dev/null > +++ b/arch/powerpc/lib/Kconfig > @@ -0,0 +1,9 @@ > +config CACHE_FLUSH_WATCHDOG_THRESHOLD > + int "Bytes to flush between WATCHDOG_RESET calls" > + default 0 > + help > + The flush_cache() function periodically, and by default for > + every cache line, calls WATCHDOG_RESET(). When flushing a > + large area, that may add a significant amount of > + overhead. This option allows you to set a threshold for how > + many bytes to flush between each WATCHDOG_RESET call. > diff --git a/arch/powerpc/lib/cache.c b/arch/powerpc/lib/cache.c > index 3e487f50fe..115ae8e160 100644 > --- a/arch/powerpc/lib/cache.c > +++ b/arch/powerpc/lib/cache.c > @@ -12,6 +12,8 @@ > void flush_cache(ulong start_addr, ulong size) > { > ulong addr, start, end; > + ulong wd_limit = CONFIG_CACHE_FLUSH_WATCHDOG_THRESHOLD; > + ulong flushed = 0; > > start = start_addr & ~(CONFIG_SYS_CACHELINE_SIZE - 1); > end = start_addr + size - 1; > @@ -19,7 +21,11 @@ void flush_cache(ulong start_addr, ulong size) > for (addr = start; (addr <= end) && (addr >= start); > addr += CONFIG_SYS_CACHELINE_SIZE) { > asm volatile("dcbst 0,%0" : : "r" (addr) : "memory"); > - WATCHDOG_RESET(); > + flushed += CONFIG_SYS_CACHELINE_SIZE; > + if (flushed >= wd_limit) { > + WATCHDOG_RESET(); > + flushed = 0; > + } > } > /* wait for all dcbst to complete on bus */ > asm volatile("sync" : : : "memory"); > @@ -27,7 +33,11 @@ void flush_cache(ulong start_addr, ulong size) > for (addr = start; (addr <= end) && (addr >= start); > addr += CONFIG_SYS_CACHELINE_SIZE) { > asm volatile("icbi 0,%0" : : "r" (addr) : "memory"); > - WATCHDOG_RESET(); > + flushed += CONFIG_SYS_CACHELINE_SIZE; > + if (flushed >= wd_limit) { > + WATCHDOG_RESET(); > + flushed = 0; > + } It might make sense to pull these 2 identical code snippets into a function to not duplicate this code. Other than this: Reviewed-by: Stefan Roese <sr@denx.de> Thanks, Stefan
On 15/04/2021 07.34, Stefan Roese wrote: > On 13.04.21 21:38, Rasmus Villemoes wrote: >> >> I have checked with objdump that the generated code doesn't change >> when this option is left at its default value of 0: gcc is smart >> enough to see that wd_limit is constant 0, the ">=" comparisons are >> tautologically true, hence all assignments to "flushed" are eliminated > > Nitpicking: > s/tautologically/autologically. No. https://www.merriam-webster.com/dictionary/tautological https://en.wikipedia.org/wiki/Tautology_(logic) "In Mathematical logic, a tautology (from Greek: ταυτολογία) is a formula or assertion that is true in every possible interpretation." It even exists in gcc docs in the form of -Wtautological-compare. [But I do admit that "tautologically true" is a pleonasm; "tautologies" or "automatically true" could have been used instead. And going full meta, now that I look up pleonasm in wikipedia to check that I'm using it right, that article has a reference to "tautology (language)" in its second sentence.] >> /* wait for all dcbst to complete on bus */ >> asm volatile("sync" : : : "memory"); >> @@ -27,7 +33,11 @@ void flush_cache(ulong start_addr, ulong size) >> for (addr = start; (addr <= end) && (addr >= start); >> addr += CONFIG_SYS_CACHELINE_SIZE) { >> asm volatile("icbi 0,%0" : : "r" (addr) : "memory"); >> - WATCHDOG_RESET(); >> + flushed += CONFIG_SYS_CACHELINE_SIZE; >> + if (flushed >= wd_limit) { >> + WATCHDOG_RESET(); >> + flushed = 0; >> + } > > It might make sense to pull these 2 identical code snippets into a > function to not duplicate this code. Something like static ulong maybe_reset(ulong flushed) { const ulong wd_limit = ...; flushed += CONFIG_SYS_CACHELINE_SIZE; if (flushed >= wd_limit) { WATCHDOG_RESET(); flushed = 0; } return flushed; } and "flushed = maybe_reset(flushed);"? I don't think that improves readability. Thanks, Rasmus
On 15.04.21 08:46, Rasmus Villemoes wrote: > On 15/04/2021 07.34, Stefan Roese wrote: >> On 13.04.21 21:38, Rasmus Villemoes wrote: >>> >>> I have checked with objdump that the generated code doesn't change >>> when this option is left at its default value of 0: gcc is smart >>> enough to see that wd_limit is constant 0, the ">=" comparisons are >>> tautologically true, hence all assignments to "flushed" are eliminated >> >> Nitpicking: >> s/tautologically/autologically. > > No. > > https://www.merriam-webster.com/dictionary/tautological > https://en.wikipedia.org/wiki/Tautology_(logic) > "In Mathematical logic, a tautology (from Greek: ταυτολογία) is a > formula or assertion that is true in every possible interpretation." > > It even exists in gcc docs in the form of -Wtautological-compare. [But I > do admit that "tautologically true" is a pleonasm; "tautologies" or > "automatically true" could have been used instead. And going full meta, > now that I look up pleonasm in wikipedia to check that I'm using it > right, that article has a reference to "tautology (language)" in its > second sentence.] Interesting. Thanks for the detailed info. >>> /* wait for all dcbst to complete on bus */ >>> asm volatile("sync" : : : "memory"); >>> @@ -27,7 +33,11 @@ void flush_cache(ulong start_addr, ulong size) >>> for (addr = start; (addr <= end) && (addr >= start); >>> addr += CONFIG_SYS_CACHELINE_SIZE) { >>> asm volatile("icbi 0,%0" : : "r" (addr) : "memory"); >>> - WATCHDOG_RESET(); >>> + flushed += CONFIG_SYS_CACHELINE_SIZE; >>> + if (flushed >= wd_limit) { >>> + WATCHDOG_RESET(); >>> + flushed = 0; >>> + } >> >> It might make sense to pull these 2 identical code snippets into a >> function to not duplicate this code. > > Something like > > static ulong maybe_reset(ulong flushed) > { > const ulong wd_limit = ...; > flushed += CONFIG_SYS_CACHELINE_SIZE; > if (flushed >= wd_limit) { > WATCHDOG_RESET(); > flushed = 0; > } > return flushed; > } > > and "flushed = maybe_reset(flushed);"? I don't think that improves > readability. Yes, something like this. I agree that it's not really beautiful but for my personal taste it's a bit better. But I don't have hard feeling here. Let's see what other comment on this. Thanks, Stefan
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 6a2e88fed2..133447648c 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -49,5 +49,6 @@ source "arch/powerpc/cpu/mpc83xx/Kconfig" source "arch/powerpc/cpu/mpc85xx/Kconfig" source "arch/powerpc/cpu/mpc86xx/Kconfig" source "arch/powerpc/cpu/mpc8xx/Kconfig" +source "arch/powerpc/lib/Kconfig" endmenu diff --git a/arch/powerpc/lib/Kconfig b/arch/powerpc/lib/Kconfig new file mode 100644 index 0000000000..b30b5edf7c --- /dev/null +++ b/arch/powerpc/lib/Kconfig @@ -0,0 +1,9 @@ +config CACHE_FLUSH_WATCHDOG_THRESHOLD + int "Bytes to flush between WATCHDOG_RESET calls" + default 0 + help + The flush_cache() function periodically, and by default for + every cache line, calls WATCHDOG_RESET(). When flushing a + large area, that may add a significant amount of + overhead. This option allows you to set a threshold for how + many bytes to flush between each WATCHDOG_RESET call. diff --git a/arch/powerpc/lib/cache.c b/arch/powerpc/lib/cache.c index 3e487f50fe..115ae8e160 100644 --- a/arch/powerpc/lib/cache.c +++ b/arch/powerpc/lib/cache.c @@ -12,6 +12,8 @@ void flush_cache(ulong start_addr, ulong size) { ulong addr, start, end; + ulong wd_limit = CONFIG_CACHE_FLUSH_WATCHDOG_THRESHOLD; + ulong flushed = 0; start = start_addr & ~(CONFIG_SYS_CACHELINE_SIZE - 1); end = start_addr + size - 1; @@ -19,7 +21,11 @@ void flush_cache(ulong start_addr, ulong size) for (addr = start; (addr <= end) && (addr >= start); addr += CONFIG_SYS_CACHELINE_SIZE) { asm volatile("dcbst 0,%0" : : "r" (addr) : "memory"); - WATCHDOG_RESET(); + flushed += CONFIG_SYS_CACHELINE_SIZE; + if (flushed >= wd_limit) { + WATCHDOG_RESET(); + flushed = 0; + } } /* wait for all dcbst to complete on bus */ asm volatile("sync" : : : "memory"); @@ -27,7 +33,11 @@ void flush_cache(ulong start_addr, ulong size) for (addr = start; (addr <= end) && (addr >= start); addr += CONFIG_SYS_CACHELINE_SIZE) { asm volatile("icbi 0,%0" : : "r" (addr) : "memory"); - WATCHDOG_RESET(); + flushed += CONFIG_SYS_CACHELINE_SIZE; + if (flushed >= wd_limit) { + WATCHDOG_RESET(); + flushed = 0; + } } asm volatile("sync" : : : "memory"); /* flush prefetch queue */
When flush_cache() is called during boot on our ~7M kernel image, the hundreds of thousands of WATCHDOG_RESET calls end up adding significantly to boottime. Flushing a single cache line doesn't take many microseconds, so doing these calls for every cache line is complete overkill. The generic watchdog_reset() provided by wdt-uclass.c actually contains some rate-limiting logic that should in theory mitigate this, but alas, that rate-limiting must be disabled on powerpc because of its get_timer() implementation - get_timer() works just fine until interrupts are disabled, but it just so happens that the "big" flush_cache() call happens in the part of bootm where interrupts are indeed disabled. [1] [2] [3] I have checked with objdump that the generated code doesn't change when this option is left at its default value of 0: gcc is smart enough to see that wd_limit is constant 0, the ">=" comparisons are tautologically true, hence all assignments to "flushed" are eliminated as dead stores. On our board, setting the option to something like 65536 ends up reducing total boottime by about 0.8 seconds. [1] https://patchwork.ozlabs.org/project/uboot/patch/20200605111657.28773-1-rasmus.villemoes@prevas.dk/ [2] https://lists.denx.de/pipermail/u-boot/2021-April/446906.html [3] https://lists.denx.de/pipermail/u-boot/2021-April/447280.html Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk> --- arch/powerpc/Kconfig | 1 + arch/powerpc/lib/Kconfig | 9 +++++++++ arch/powerpc/lib/cache.c | 14 ++++++++++++-- 3 files changed, 22 insertions(+), 2 deletions(-) create mode 100644 arch/powerpc/lib/Kconfig