mbox series

[0/8] ntp/rtc: Fixes and cleanups

Message ID 20201206214613.444124194@linutronix.de
Headers show
Series ntp/rtc: Fixes and cleanups | expand

Message

Thomas Gleixner Dec. 6, 2020, 9:46 p.m. UTC
Miroslav ran into a situation where the periodic RTC synchronization almost
never was able to hit the time window for the update. That happens due the
usage of delayed_work and the properties of the timer wheel.

While that particular problem is halfways simple to fix this started to
unearth other problems with that code particularly with rtc_set_npt_time()
but expanded into other things as well.

  1) The update offset for rtc-cmos is off by a full second

  2) The readout of MC146818 (rtc-cmos and arch code) is broken and can
     return garbage.

  2) Alexandre questioned the approach in general and wants to get rid of
     it. Of course there are better methods to do that and it can be
     completely done in user space.

     Unfortunately it's not that simple as this would be a user visible
     change, so making it at least halfways correct.

  3) Alexandre requested to move that code into the NTP code as this is not
     really RTC functionality and just usage of the RTC API.

  4) The update offset itself was questioned, but the time between the
     write and the next seconds increment in the RTC is fundamentaly a
     hardware property. The transport time, which is pretty irrelevant for
     direct accessible RTCs (rtc-cmos), but relevant for RTC behind i2c/SPI
     needs to be added on top.

     It's undebated that this transport time cannot be correctly estimated,
     but right now it's 500ms which is far off. The correct transport time
     can be calibrated, a halfways correct value supplied via DT, but
     that's an orthogonal problem.

The following series addresses the above:

    1) Fix the readout function of MC146818
    2) Fix the rtc-cmos offset
    3) Reduce the default transport time

    4) Switch the NTP periodic sync code to a hrtimer/work combo

    5) Move rtc_set_npt_time() to the ntp code
    6) Make the update offset more intuitive
    7) Simplify the whole machinery
     
Thanks,

	tglx
---
 a/drivers/rtc/systohc.c        |   61 ----------
 drivers/rtc/Makefile           |    1 
 drivers/rtc/class.c            |    9 +
 drivers/rtc/rtc-cmos.c         |    3 
 drivers/rtc/rtc-mc146818-lib.c |   70 +++++++-----
 include/linux/rtc.h            |   69 +++++-------
 include/linux/timex.h          |    1 
 kernel/time/ntp.c              |  232 ++++++++++++++++++++++++-----------------
 kernel/time/ntp_internal.h     |    7 +
 9 files changed, 225 insertions(+), 228 deletions(-)

Comments

Thomas Gleixner Dec. 9, 2020, 12:33 a.m. UTC | #1
Alexandre,

On Sun, Dec 06 2020 at 22:46, Thomas Gleixner wrote:
> Miroslav ran into a situation where the periodic RTC synchronization almost
> never was able to hit the time window for the update. That happens due the
> usage of delayed_work and the properties of the timer wheel.
>
> While that particular problem is halfways simple to fix this started to
> unearth other problems with that code particularly with rtc_set_npt_time()
> but expanded into other things as well.
>
>   1) The update offset for rtc-cmos is off by a full second
>
>   2) The readout of MC146818 (rtc-cmos and arch code) is broken and can
>      return garbage.
>
>   2) Alexandre questioned the approach in general and wants to get rid of
>      it. Of course there are better methods to do that and it can be
>      completely done in user space.
>
>      Unfortunately it's not that simple as this would be a user visible
>      change, so making it at least halfways correct.
>
>   3) Alexandre requested to move that code into the NTP code as this is not
>      really RTC functionality and just usage of the RTC API.
>
>   4) The update offset itself was questioned, but the time between the
>      write and the next seconds increment in the RTC is fundamentaly a
>      hardware property. The transport time, which is pretty irrelevant for
>      direct accessible RTCs (rtc-cmos), but relevant for RTC behind i2c/SPI
>      needs to be added on top.
>
>      It's undebated that this transport time cannot be correctly estimated,
>      but right now it's 500ms which is far off. The correct transport time
>      can be calibrated, a halfways correct value supplied via DT, but
>      that's an orthogonal problem.
>
> The following series addresses the above:
>
>     1) Fix the readout function of MC146818
>     2) Fix the rtc-cmos offset
>     3) Reduce the default transport time
>
>     4) Switch the NTP periodic sync code to a hrtimer/work combo
>
>     5) Move rtc_set_npt_time() to the ntp code
>     6) Make the update offset more intuitive
>     7) Simplify the whole machinery

any opinion on this?

Thanks,

        tglx
Alexandre Belloni Dec. 9, 2020, 4:01 a.m. UTC | #2
On 09/12/2020 01:33:08+0100, Thomas Gleixner wrote:
> Alexandre,
> 
> On Sun, Dec 06 2020 at 22:46, Thomas Gleixner wrote:
> > Miroslav ran into a situation where the periodic RTC synchronization almost
> > never was able to hit the time window for the update. That happens due the
> > usage of delayed_work and the properties of the timer wheel.
> >
> > While that particular problem is halfways simple to fix this started to
> > unearth other problems with that code particularly with rtc_set_npt_time()
> > but expanded into other things as well.
> >
> >   1) The update offset for rtc-cmos is off by a full second
> >
> >   2) The readout of MC146818 (rtc-cmos and arch code) is broken and can
> >      return garbage.
> >
> >   2) Alexandre questioned the approach in general and wants to get rid of
> >      it. Of course there are better methods to do that and it can be
> >      completely done in user space.
> >
> >      Unfortunately it's not that simple as this would be a user visible
> >      change, so making it at least halfways correct.
> >
> >   3) Alexandre requested to move that code into the NTP code as this is not
> >      really RTC functionality and just usage of the RTC API.
> >
> >   4) The update offset itself was questioned, but the time between the
> >      write and the next seconds increment in the RTC is fundamentaly a
> >      hardware property. The transport time, which is pretty irrelevant for
> >      direct accessible RTCs (rtc-cmos), but relevant for RTC behind i2c/SPI
> >      needs to be added on top.
> >
> >      It's undebated that this transport time cannot be correctly estimated,
> >      but right now it's 500ms which is far off. The correct transport time
> >      can be calibrated, a halfways correct value supplied via DT, but
> >      that's an orthogonal problem.
> >
> > The following series addresses the above:
> >
> >     1) Fix the readout function of MC146818
> >     2) Fix the rtc-cmos offset
> >     3) Reduce the default transport time
> >
> >     4) Switch the NTP periodic sync code to a hrtimer/work combo
> >
> >     5) Move rtc_set_npt_time() to the ntp code
> >     6) Make the update offset more intuitive
> >     7) Simplify the whole machinery
> 
> any opinion on this?
> 

This looks very good to me, however, I think the 10ms offset is a bit
too much. Do you mind waiting one or two days so I can get my test setup
back up?
Thomas Gleixner Dec. 9, 2020, 12:39 p.m. UTC | #3
On Wed, Dec 09 2020 at 05:01, Alexandre Belloni wrote:
> On 09/12/2020 01:33:08+0100, Thomas Gleixner wrote:
>> > The following series addresses the above:
>> >
>> >     1) Fix the readout function of MC146818
>> >     2) Fix the rtc-cmos offset
>> >     3) Reduce the default transport time
>> >
>> >     4) Switch the NTP periodic sync code to a hrtimer/work combo
>> >
>> >     5) Move rtc_set_npt_time() to the ntp code
>> >     6) Make the update offset more intuitive
>> >     7) Simplify the whole machinery
>> 
>> any opinion on this?
>> 
> This looks very good to me, however, I think the 10ms offset is a bit
> too much.

I pulled it out of thin air TBH. It's at least 50 times more accurate
than what we had before :)

> Do you mind waiting one or two days so I can get my test setup
> back up?

No problem.

Thanks,

        tglx