diff mbox series

realtek: mark clock source as continuous

Message ID 20221031091103.667846-1-sander@svanheule.net
State Accepted
Delegated to: Sander Vanheule
Headers show
Series realtek: mark clock source as continuous | expand

Commit Message

Sander Vanheule Oct. 31, 2022, 9:11 a.m. UTC
After replacing the R4K event timer and clock source with the new
Realtek Otto timer, performance for RTL839x devices was severely
impacted, as reported by Hiroshi.

Research by Markus showed that after commit 4657a5301eb5 (realtek: avoid
busy waiting for RTL839x PHY read/write, 2022-10-14), the ethernet
driver could only update a phy once per timer interval, which also
heavily impacted boot time. On e.g. a Zyxel GS1900-48, this added around
a minute to the time to fully initialise the switch.

By marking the otto clocksource as continuous, the kernel enables it to
be used for high resolution timers. This allows readx_poll_timeout() to
sleep for less than one system timer interval, reducing system dead
time.

Link: https://github.com/openwrt/openwrt/issues/11117
Reported-by: INAGAKI Hiroshi <musashino.open@gmail.com>
Cc: Markus Stockhausen <markus.stockhausen@gmx.de>
Signed-off-by: Sander Vanheule <sander@svanheule.net>
---
With this patch, initialisation time for my GS1900-48 drops from 110
seconds to 50 seconds. Please check if you can reproduce this. The 'why
this works' from the commit message is from a quick look at the places
where this CLOCK_SOURCE flag is checked, so I hope it actually makes sense.

Best,
Sander

 .../realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c      | 1 +
 1 file changed, 1 insertion(+)

Comments

INAGAKI Hiroshi Oct. 31, 2022, 1:09 p.m. UTC | #1
Hi Sander,

thanks for the patch!

I tested on my Panasonic Switch-M48eG PN28480K (RTL8393M) and all 
issues are solved :)

- top page loading of LuCI (WebUI) at login: 2.64 s
- ethtool lanX (10x times): avg 0.00 s, max 0.01 s
- /proc/interrupts (per (about) 1s):
   root@OpenWrt:/# while :; do grep "timer" /proc/interrupts; echo 
"---"; sleep 1; done
    28:          0     388182  realtek-rtl-intc  28  timer@3100
    29:    1179094          0  realtek-rtl-intc  29  timer@3100
   ---
    28:          0     388632  realtek-rtl-intc  28  timer@3100
    29:    1181698          0  realtek-rtl-intc  29  timer@3100
   ---
    28:          0     389094  realtek-rtl-intc  28  timer@3100
    29:    1184276          0  realtek-rtl-intc  29  timer@3100
   ---
    28:          0     389350  realtek-rtl-intc  28  timer@3100
    29:    1185182          0  realtek-rtl-intc  29  timer@3100
   ---
    28:          0     389651  realtek-rtl-intc  28  timer@3100
    29:    1186753          0  realtek-rtl-intc  29  timer@3100
   ---
- bootlog:
https://gist.github.com/musashino205/2aa32ff858d863de9790c75c3a0b5bf8

   (without the patch):
https://gist.github.com/musashino205/0db98d9f99fbacc8a311f65739b72eb7

Thanks,
Hiroshi

On 2022/10/31 18:11, Sander Vanheule wrote:
> After replacing the R4K event timer and clock source with the new
> Realtek Otto timer, performance for RTL839x devices was severely
> impacted, as reported by Hiroshi.
>
> Research by Markus showed that after commit 4657a5301eb5 (realtek: avoid
> busy waiting for RTL839x PHY read/write, 2022-10-14), the ethernet
> driver could only update a phy once per timer interval, which also
> heavily impacted boot time. On e.g. a Zyxel GS1900-48, this added around
> a minute to the time to fully initialise the switch.
>
> By marking the otto clocksource as continuous, the kernel enables it to
> be used for high resolution timers. This allows readx_poll_timeout() to
> sleep for less than one system timer interval, reducing system dead
> time.
>
> Link: https://github.com/openwrt/openwrt/issues/11117
> Reported-by: INAGAKI Hiroshi <musashino.open@gmail.com>
> Cc: Markus Stockhausen <markus.stockhausen@gmx.de>
> Signed-off-by: Sander Vanheule <sander@svanheule.net>
> ---
> With this patch, initialisation time for my GS1900-48 drops from 110
> seconds to 50 seconds. Please check if you can reproduce this. The 'why
> this works' from the commit message is from a quick look at the places
> where this CLOCK_SOURCE flag is checked, so I hope it actually makes sense.
>
> Best,
> Sander
>
>   .../realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c      | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/target/linux/realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c b/target/linux/realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c
> index 12eed78653d0..14e28e50f40e 100644
> --- a/target/linux/realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c
> +++ b/target/linux/realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c
> @@ -227,6 +227,7 @@ struct rttm_cs rttm_cs = {
>   		.name	= "realtek_otto_timer",
>   		.rating	= 400,
>   		.mask	= CLOCKSOURCE_MASK(RTTM_BIT_COUNT),
> +		.flags	= CLOCK_SOURCE_IS_CONTINUOUS,
>   		.read	= rttm_read_clocksource,
>   		.enable	= rttm_enable_clocksource
>   	}
Jan Hoffmann Oct. 31, 2022, 8:21 p.m. UTC | #2
On 31.10.22 at 10:11, Sander Vanheule wrote:
> After replacing the R4K event timer and clock source with the new
> Realtek Otto timer, performance for RTL839x devices was severely
> impacted, as reported by Hiroshi.
> 
> Research by Markus showed that after commit 4657a5301eb5 (realtek: avoid
> busy waiting for RTL839x PHY read/write, 2022-10-14), the ethernet
> driver could only update a phy once per timer interval, which also
> heavily impacted boot time. On e.g. a Zyxel GS1900-48, this added around
> a minute to the time to fully initialise the switch.
> 
> By marking the otto clocksource as continuous, the kernel enables it to
> be used for high resolution timers. This allows readx_poll_timeout() to
> sleep for less than one system timer interval, reducing system dead
> time.

I can confirm this brings usleep_range sleep duration back to a more 
reasonable amount of time.

Average time for usleep_range(10, 20); when called in a loop 10000 times 
(tested on HPE 1920-8G for RTL838x and 1920-48G for RTL839x):

                           RTL838x     RTL839x
r4k timer                   48 us       31 us
otto timer               10003 us    10031 us
otto timer with patch       60 us       32 us

Average runtime of the polling loop in smi_wait_op is now also back to 
where it was with the r4k timer (under 0.3 ms for RTL838x and under 0.5 
ms for RTL839x on otherwise idle device, compared to 5 ms / 10 ms when 
using the otto timer without the patch).

> Link: https://github.com/openwrt/openwrt/issues/11117
> Reported-by: INAGAKI Hiroshi <musashino.open@gmail.com>
> Cc: Markus Stockhausen <markus.stockhausen@gmx.de>
> Signed-off-by: Sander Vanheule <sander@svanheule.net>
> ---
> With this patch, initialisation time for my GS1900-48 drops from 110
> seconds to 50 seconds. Please check if you can reproduce this. The 'why
> this works' from the commit message is from a quick look at the places
> where this CLOCK_SOURCE flag is checked, so I hope it actually makes sense.
> 
> Best,
> Sander
> 
>   .../realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c      | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/target/linux/realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c b/target/linux/realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c
> index 12eed78653d0..14e28e50f40e 100644
> --- a/target/linux/realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c
> +++ b/target/linux/realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c
> @@ -227,6 +227,7 @@ struct rttm_cs rttm_cs = {
>   		.name	= "realtek_otto_timer",
>   		.rating	= 400,
>   		.mask	= CLOCKSOURCE_MASK(RTTM_BIT_COUNT),
> +		.flags	= CLOCK_SOURCE_IS_CONTINUOUS,
>   		.read	= rttm_read_clocksource,
>   		.enable	= rttm_enable_clocksource
>   	}
Sander Vanheule Oct. 31, 2022, 8:40 p.m. UTC | #3
Hi Jan,

On Mon, 2022-10-31 at 21:21 +0100, Jan Hoffmann wrote:
> On 31.10.22 at 10:11, Sander Vanheule wrote:
> > After replacing the R4K event timer and clock source with the new
> > Realtek Otto timer, performance for RTL839x devices was severely
> > impacted, as reported by Hiroshi.
> > 
> > Research by Markus showed that after commit 4657a5301eb5 (realtek: avoid
> > busy waiting for RTL839x PHY read/write, 2022-10-14), the ethernet
> > driver could only update a phy once per timer interval, which also
> > heavily impacted boot time. On e.g. a Zyxel GS1900-48, this added around
> > a minute to the time to fully initialise the switch.
> > 
> > By marking the otto clocksource as continuous, the kernel enables it to
> > be used for high resolution timers. This allows readx_poll_timeout() to
> > sleep for less than one system timer interval, reducing system dead
> > time.
> 
> I can confirm this brings usleep_range sleep duration back to a more 
> reasonable amount of time.
> 
> Average time for usleep_range(10, 20); when called in a loop 10000 times 
> (tested on HPE 1920-8G for RTL838x and 1920-48G for RTL839x):
> 
>                            RTL838x     RTL839x
> r4k timer                   48 us       31 us
> otto timer               10003 us    10031 us
> otto timer with patch       60 us       32 us
> 
> Average runtime of the polling loop in smi_wait_op is now also back to 
> where it was with the r4k timer (under 0.3 ms for RTL838x and under 0.5 
> ms for RTL839x on otherwise idle device, compared to 5 ms / 10 ms when 
> using the otto timer without the patch).

Thanks for doing these checks!

Markus expressed his concern earlier today (on IRC) with the sleep duration in
the polling loop. usleep_range((20 >> 2) + 1, 20) for a something that takes
300/500 us might be a bit fast, especially considering how large the overhead is
(+40 us on RTL838x, +12 us on RTL839x) on a 10-20 us sleep.

Would it make sense to you to bump the timeout for the smi poller to, say, 80
microseconds? Could be a different duration too, just something that strikes a
good balance between not waiting too long, and not keeping the CPU busy setting
up new timers.

Best,
Sander

> 
> > Link: https://github.com/openwrt/openwrt/issues/11117
> > Reported-by: INAGAKI Hiroshi <musashino.open@gmail.com>
> > Cc: Markus Stockhausen <markus.stockhausen@gmx.de>
> > Signed-off-by: Sander Vanheule <sander@svanheule.net>
> > ---
> > With this patch, initialisation time for my GS1900-48 drops from 110
> > seconds to 50 seconds. Please check if you can reproduce this. The 'why
> > this works' from the commit message is from a quick look at the places
> > where this CLOCK_SOURCE flag is checked, so I hope it actually makes sense.
> > 
> > Best,
> > Sander
> > 
> >   .../realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c      | 1 +
> >   1 file changed, 1 insertion(+)
> > 
> > diff --git a/target/linux/realtek/files-5.10/drivers/clocksource/timer-rtl-
> > otto.c b/target/linux/realtek/files-5.10/drivers/clocksource/timer-rtl-
> > otto.c
> > index 12eed78653d0..14e28e50f40e 100644
> > --- a/target/linux/realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c
> > +++ b/target/linux/realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c
> > @@ -227,6 +227,7 @@ struct rttm_cs rttm_cs = {
> >                 .name   = "realtek_otto_timer",
> >                 .rating = 400,
> >                 .mask   = CLOCKSOURCE_MASK(RTTM_BIT_COUNT),
> > +               .flags  = CLOCK_SOURCE_IS_CONTINUOUS,
> >                 .read   = rttm_read_clocksource,
> >                 .enable = rttm_enable_clocksource
> >         }
diff mbox series

Patch

diff --git a/target/linux/realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c b/target/linux/realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c
index 12eed78653d0..14e28e50f40e 100644
--- a/target/linux/realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c
+++ b/target/linux/realtek/files-5.10/drivers/clocksource/timer-rtl-otto.c
@@ -227,6 +227,7 @@  struct rttm_cs rttm_cs = {
 		.name	= "realtek_otto_timer",
 		.rating	= 400,
 		.mask	= CLOCKSOURCE_MASK(RTTM_BIT_COUNT),
+		.flags	= CLOCK_SOURCE_IS_CONTINUOUS,
 		.read	= rttm_read_clocksource,
 		.enable	= rttm_enable_clocksource
 	}