Message ID | 20221031091103.667846-1-sander@svanheule.net |
---|---|
State | Accepted |
Delegated to: | Sander Vanheule |
Headers | show |
Series | realtek: mark clock source as continuous | expand |
Hi Sander, thanks for the patch! I tested on my Panasonic Switch-M48eG PN28480K (RTL8393M) and all issues are solved :) - top page loading of LuCI (WebUI) at login: 2.64 s - ethtool lanX (10x times): avg 0.00 s, max 0.01 s - /proc/interrupts (per (about) 1s): root@OpenWrt:/# while :; do grep "timer" /proc/interrupts; echo "---"; sleep 1; done 28: 0 388182 realtek-rtl-intc 28 timer@3100 29: 1179094 0 realtek-rtl-intc 29 timer@3100 --- 28: 0 388632 realtek-rtl-intc 28 timer@3100 29: 1181698 0 realtek-rtl-intc 29 timer@3100 --- 28: 0 389094 realtek-rtl-intc 28 timer@3100 29: 1184276 0 realtek-rtl-intc 29 timer@3100 --- 28: 0 389350 realtek-rtl-intc 28 timer@3100 29: 1185182 0 realtek-rtl-intc 29 timer@3100 --- 28: 0 389651 realtek-rtl-intc 28 timer@3100 29: 1186753 0 realtek-rtl-intc 29 timer@3100 --- - bootlog: https://gist.github.com/musashino205/2aa32ff858d863de9790c75c3a0b5bf8 (without the patch): https://gist.github.com/musashino205/0db98d9f99fbacc8a311f65739b72eb7 Thanks, Hiroshi On 2022/10/31 18:11, Sander Vanheule wrote: > After replacing the R4K event timer and clock source with the new > Realtek Otto timer, performance for RTL839x devices was severely > impacted, as reported by Hiroshi. > > Research by Markus showed that after commit 4657a5301eb5 (realtek: avoid > busy waiting for RTL839x PHY read/write, 2022-10-14), the ethernet > driver could only update a phy once per timer interval, which also > heavily impacted boot time. On e.g. a Zyxel GS1900-48, this added around > a minute to the time to fully initialise the switch. > > By marking the otto clocksource as continuous, the kernel enables it to > be used for high resolution timers. This allows readx_poll_timeout() to > sleep for less than one system timer interval, reducing system dead > time. > > Link: https://github.com/openwrt/openwrt/issues/11117 > Reported-by: INAGAKI Hiroshi <musashino.open@gmail.com> > Cc: Markus Stockhausen <markus.stockhausen@gmx.de> > Signed-off-by: Sander Vanheule <sander@svanheule.net> > --- > With this patch, initialisation time for my GS1900-48 drops from 110 > seconds to 50 seconds. Please check if you can reproduce this. The 'why > this works' from the commit message is from a quick look at the places > where this CLOCK_SOURCE flag is checked, so I hope it actually makes sense. > > Best, > Sander > > .../realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/target/linux/realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c b/target/linux/realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c > index 12eed78653d0..14e28e50f40e 100644 > --- a/target/linux/realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c > +++ b/target/linux/realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c > @@ -227,6 +227,7 @@ struct rttm_cs rttm_cs = { > .name = "realtek_otto_timer", > .rating = 400, > .mask = CLOCKSOURCE_MASK(RTTM_BIT_COUNT), > + .flags = CLOCK_SOURCE_IS_CONTINUOUS, > .read = rttm_read_clocksource, > .enable = rttm_enable_clocksource > }
On 31.10.22 at 10:11, Sander Vanheule wrote: > After replacing the R4K event timer and clock source with the new > Realtek Otto timer, performance for RTL839x devices was severely > impacted, as reported by Hiroshi. > > Research by Markus showed that after commit 4657a5301eb5 (realtek: avoid > busy waiting for RTL839x PHY read/write, 2022-10-14), the ethernet > driver could only update a phy once per timer interval, which also > heavily impacted boot time. On e.g. a Zyxel GS1900-48, this added around > a minute to the time to fully initialise the switch. > > By marking the otto clocksource as continuous, the kernel enables it to > be used for high resolution timers. This allows readx_poll_timeout() to > sleep for less than one system timer interval, reducing system dead > time. I can confirm this brings usleep_range sleep duration back to a more reasonable amount of time. Average time for usleep_range(10, 20); when called in a loop 10000 times (tested on HPE 1920-8G for RTL838x and 1920-48G for RTL839x): RTL838x RTL839x r4k timer 48 us 31 us otto timer 10003 us 10031 us otto timer with patch 60 us 32 us Average runtime of the polling loop in smi_wait_op is now also back to where it was with the r4k timer (under 0.3 ms for RTL838x and under 0.5 ms for RTL839x on otherwise idle device, compared to 5 ms / 10 ms when using the otto timer without the patch). > Link: https://github.com/openwrt/openwrt/issues/11117 > Reported-by: INAGAKI Hiroshi <musashino.open@gmail.com> > Cc: Markus Stockhausen <markus.stockhausen@gmx.de> > Signed-off-by: Sander Vanheule <sander@svanheule.net> > --- > With this patch, initialisation time for my GS1900-48 drops from 110 > seconds to 50 seconds. Please check if you can reproduce this. The 'why > this works' from the commit message is from a quick look at the places > where this CLOCK_SOURCE flag is checked, so I hope it actually makes sense. > > Best, > Sander > > .../realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/target/linux/realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c b/target/linux/realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c > index 12eed78653d0..14e28e50f40e 100644 > --- a/target/linux/realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c > +++ b/target/linux/realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c > @@ -227,6 +227,7 @@ struct rttm_cs rttm_cs = { > .name = "realtek_otto_timer", > .rating = 400, > .mask = CLOCKSOURCE_MASK(RTTM_BIT_COUNT), > + .flags = CLOCK_SOURCE_IS_CONTINUOUS, > .read = rttm_read_clocksource, > .enable = rttm_enable_clocksource > }
Hi Jan, On Mon, 2022-10-31 at 21:21 +0100, Jan Hoffmann wrote: > On 31.10.22 at 10:11, Sander Vanheule wrote: > > After replacing the R4K event timer and clock source with the new > > Realtek Otto timer, performance for RTL839x devices was severely > > impacted, as reported by Hiroshi. > > > > Research by Markus showed that after commit 4657a5301eb5 (realtek: avoid > > busy waiting for RTL839x PHY read/write, 2022-10-14), the ethernet > > driver could only update a phy once per timer interval, which also > > heavily impacted boot time. On e.g. a Zyxel GS1900-48, this added around > > a minute to the time to fully initialise the switch. > > > > By marking the otto clocksource as continuous, the kernel enables it to > > be used for high resolution timers. This allows readx_poll_timeout() to > > sleep for less than one system timer interval, reducing system dead > > time. > > I can confirm this brings usleep_range sleep duration back to a more > reasonable amount of time. > > Average time for usleep_range(10, 20); when called in a loop 10000 times > (tested on HPE 1920-8G for RTL838x and 1920-48G for RTL839x): > > RTL838x RTL839x > r4k timer 48 us 31 us > otto timer 10003 us 10031 us > otto timer with patch 60 us 32 us > > Average runtime of the polling loop in smi_wait_op is now also back to > where it was with the r4k timer (under 0.3 ms for RTL838x and under 0.5 > ms for RTL839x on otherwise idle device, compared to 5 ms / 10 ms when > using the otto timer without the patch). Thanks for doing these checks! Markus expressed his concern earlier today (on IRC) with the sleep duration in the polling loop. usleep_range((20 >> 2) + 1, 20) for a something that takes 300/500 us might be a bit fast, especially considering how large the overhead is (+40 us on RTL838x, +12 us on RTL839x) on a 10-20 us sleep. Would it make sense to you to bump the timeout for the smi poller to, say, 80 microseconds? Could be a different duration too, just something that strikes a good balance between not waiting too long, and not keeping the CPU busy setting up new timers. Best, Sander > > > Link: https://github.com/openwrt/openwrt/issues/11117 > > Reported-by: INAGAKI Hiroshi <musashino.open@gmail.com> > > Cc: Markus Stockhausen <markus.stockhausen@gmx.de> > > Signed-off-by: Sander Vanheule <sander@svanheule.net> > > --- > > With this patch, initialisation time for my GS1900-48 drops from 110 > > seconds to 50 seconds. Please check if you can reproduce this. The 'why > > this works' from the commit message is from a quick look at the places > > where this CLOCK_SOURCE flag is checked, so I hope it actually makes sense. > > > > Best, > > Sander > > > > .../realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/target/linux/realtek/files-5.10/drivers/clocksource/timer-rtl- > > otto.c b/target/linux/realtek/files-5.10/drivers/clocksource/timer-rtl- > > otto.c > > index 12eed78653d0..14e28e50f40e 100644 > > --- a/target/linux/realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c > > +++ b/target/linux/realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c > > @@ -227,6 +227,7 @@ struct rttm_cs rttm_cs = { > > .name = "realtek_otto_timer", > > .rating = 400, > > .mask = CLOCKSOURCE_MASK(RTTM_BIT_COUNT), > > + .flags = CLOCK_SOURCE_IS_CONTINUOUS, > > .read = rttm_read_clocksource, > > .enable = rttm_enable_clocksource > > }
diff --git a/target/linux/realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c b/target/linux/realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c index 12eed78653d0..14e28e50f40e 100644 --- a/target/linux/realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c +++ b/target/linux/realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c @@ -227,6 +227,7 @@ struct rttm_cs rttm_cs = { .name = "realtek_otto_timer", .rating = 400, .mask = CLOCKSOURCE_MASK(RTTM_BIT_COUNT), + .flags = CLOCK_SOURCE_IS_CONTINUOUS, .read = rttm_read_clocksource, .enable = rttm_enable_clocksource }
After replacing the R4K event timer and clock source with the new Realtek Otto timer, performance for RTL839x devices was severely impacted, as reported by Hiroshi. Research by Markus showed that after commit 4657a5301eb5 (realtek: avoid busy waiting for RTL839x PHY read/write, 2022-10-14), the ethernet driver could only update a phy once per timer interval, which also heavily impacted boot time. On e.g. a Zyxel GS1900-48, this added around a minute to the time to fully initialise the switch. By marking the otto clocksource as continuous, the kernel enables it to be used for high resolution timers. This allows readx_poll_timeout() to sleep for less than one system timer interval, reducing system dead time. Link: https://github.com/openwrt/openwrt/issues/11117 Reported-by: INAGAKI Hiroshi <musashino.open@gmail.com> Cc: Markus Stockhausen <markus.stockhausen@gmx.de> Signed-off-by: Sander Vanheule <sander@svanheule.net> --- With this patch, initialisation time for my GS1900-48 drops from 110 seconds to 50 seconds. Please check if you can reproduce this. The 'why this works' from the commit message is from a quick look at the places where this CLOCK_SOURCE flag is checked, so I hope it actually makes sense. Best, Sander .../realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c | 1 + 1 file changed, 1 insertion(+)