mbox series

[0/7] TC-ETF support PTP clocks series

Message ID 20201001205141.8885-1-erez.geva.ext@siemens.com
Headers show
Series TC-ETF support PTP clocks series | expand

Message

Geva, Erez Oct. 1, 2020, 8:51 p.m. UTC
Add support for using PTP clock with
 Traffic control Earliest TxTime First (ETF) Qdisc.

Why do we need ETF to use PTP clock?
Current ETF requires to synchronization the system clock
 to the PTP Hardware clock (PHC) we want to send through.
But there are cases that we can not synchronize the system clock with
 the desire NIC PHC.
1. We use several NICs with several PTP domains that our device
    is not allowed to be PTP master.
   And we are not allowed to synchronize these PTP domains.
2. We are using another clock source which we need for our system.
   Yet our device is not allowed to be PTP master.
Regardless of the exact topology, as the Linux tradition is to allow
 the user the freedom to choose, we propose a patch that will allow
 the user to configure the TC-ETF with a PTP clock as well as
 using the system clock.
* NOTE: we do encourage the users to synchronize the system clock with
  a PTP clock.
 As the ETF watchdog uses the system clock.
 Synchronizing the system clock with a PTP clock will probably
  reduce the frequency different of the PHC and the system clock.
 As sequence, the user might be able to reduce the ETF delta time
  and the packets latency cross the network.

Follow the decision to derive a dynamic clock ID from the file description
 of an opened PTP clock device file.
We propose a simple way to use the dynamic clock ID with the ETF Qdisc.
We will submit a patch to the "tc" tool from the iproute2 project
 once this patch is accepted.

The patches contain:
1. Add function to verify that a dynamic clock ID is derived
    from file description.
   The function follows the clock ID convention for dynamic clock ID
    for file description.
  The function will be used in the second patch.

2. Function to get main system oscillator calibration state.

3. Functions to get and put POSIX clock reference of
    a PTP Hardware Clock (PHC).
   The get function uses a dynamic clock ID created by application space.
   The purpose is that a module can hold a POSIX clock reference after the
    configuration application closed the PTP clock device file,
    and though the dynamic clock ID can not be used any further.
   The POSIX clock refereces are used by the TC-ETF.

4. A fix of the range check in qdisc_watchdog_schedule_range_ns().

5. During testing of ETF, we notice issue with the high-resolution timer
    the ETF Qdisc watchdog uses.
   The timer was set for a sleep of 300 nanoseconds but
    end up sleeping for 3 milliseconds.
   The problem happens when the timer is already active and
    the current expire is earlier then a new expire.
   So, we add a new TC schedule function that do not reprogram the timer
    under these conditions.
   The use of the function make sense as the Qdisc watchdog does act as
    watchdog.
   The Qdisc watchdog can expire earlier.
   However, if the watchdog is late, packets are dropped.

6. Add kernel configuration for TC-ETF watchdog range.
   As the range is characteristic of Hardware,
   that seems to be the proper way.

7. Add support for using PHC clock with TC-ETF.

Erez Geva (7):
  POSIX clock ID check function
  Function to retrieve main clock state
  Functions to fetch POSIX dynamic clock object
  Fix qdisc_watchdog_schedule_range_ns range check
  Traffic control using high-resolution timer issue
  TC-ETF code improvements
  TC-ETF support PTP clocks

 include/linux/posix-clock.h     |  39 +++++++++
 include/linux/posix-timers.h    |   5 ++
 include/linux/timex.h           |   1 +
 include/net/pkt_sched.h         |   2 +
 include/uapi/linux/net_tstamp.h |   5 ++
 kernel/time/posix-clock.c       |  76 ++++++++++++++++
 kernel/time/posix-timers.c      |   2 +-
 kernel/time/timekeeping.c       |   9 ++
 net/sched/Kconfig               |   8 ++
 net/sched/sch_api.c             |  36 +++++++-
 net/sched/sch_etf.c             | 148 +++++++++++++++++++++++++-------
 11 files changed, 298 insertions(+), 33 deletions(-)


base-commit: a1b8638ba1320e6684aa98233c15255eb803fac7

Comments

Vinicius Costa Gomes Oct. 2, 2020, 7:01 p.m. UTC | #1
Hi Erez,

Erez Geva <erez.geva.ext@siemens.com> writes:

> Add support for using PTP clock with
>  Traffic control Earliest TxTime First (ETF) Qdisc.
>
> Why do we need ETF to use PTP clock?
> Current ETF requires to synchronization the system clock
>  to the PTP Hardware clock (PHC) we want to send through.
> But there are cases that we can not synchronize the system clock with
>  the desire NIC PHC.
> 1. We use several NICs with several PTP domains that our device
>     is not allowed to be PTP master.
>    And we are not allowed to synchronize these PTP domains.
> 2. We are using another clock source which we need for our system.
>    Yet our device is not allowed to be PTP master.
> Regardless of the exact topology, as the Linux tradition is to allow
>  the user the freedom to choose, we propose a patch that will allow
>  the user to configure the TC-ETF with a PTP clock as well as
>  using the system clock.
> * NOTE: we do encourage the users to synchronize the system clock with
>   a PTP clock.
>  As the ETF watchdog uses the system clock.
>  Synchronizing the system clock with a PTP clock will probably
>   reduce the frequency different of the PHC and the system clock.
>  As sequence, the user might be able to reduce the ETF delta time
>   and the packets latency cross the network.
>
> Follow the decision to derive a dynamic clock ID from the file description
>  of an opened PTP clock device file.
> We propose a simple way to use the dynamic clock ID with the ETF Qdisc.
> We will submit a patch to the "tc" tool from the iproute2 project
>  once this patch is accepted.
>

In addition to what Thomas said, I would like to add some thoughts
(mostly re-wording some of Thomas' comments :-)).

I think that there's an underlying problem/limitation that is the cause
of the issue (or at least a step in the right direction) you are trying
to solve: the issue is that PTP clocks can't be used as hrtimers.

I didn't spend a lot of time thinking about how to solve this (the only
thing that comes to mind is having a timecounter, or similar, "software
view" over the PHC clock).

Anyway, my feeling is that until this is solved, we would only be
working around the problem, and creating even more hard to handle corner
cases.


Cheers,
Geva, Erez Oct. 2, 2020, 7:56 p.m. UTC | #2
On 02/10/2020 21:01, Vinicius Costa Gomes wrote:
> Hi Erez,
>
> Erez Geva <erez.geva.ext@siemens.com> writes:
>
>> Add support for using PTP clock with
>>   Traffic control Earliest TxTime First (ETF) Qdisc.
>>
>> Why do we need ETF to use PTP clock?
>> Current ETF requires to synchronization the system clock
>>   to the PTP Hardware clock (PHC) we want to send through.
>> But there are cases that we can not synchronize the system clock with
>>   the desire NIC PHC.
>> 1. We use several NICs with several PTP domains that our device
>>      is not allowed to be PTP master.
>>     And we are not allowed to synchronize these PTP domains.
>> 2. We are using another clock source which we need for our system.
>>     Yet our device is not allowed to be PTP master.
>> Regardless of the exact topology, as the Linux tradition is to allow
>>   the user the freedom to choose, we propose a patch that will allow
>>   the user to configure the TC-ETF with a PTP clock as well as
>>   using the system clock.
>> * NOTE: we do encourage the users to synchronize the system clock with
>>    a PTP clock.
>>   As the ETF watchdog uses the system clock.
>>   Synchronizing the system clock with a PTP clock will probably
>>    reduce the frequency different of the PHC and the system clock.
>>   As sequence, the user might be able to reduce the ETF delta time
>>    and the packets latency cross the network.
>>
>> Follow the decision to derive a dynamic clock ID from the file description
>>   of an opened PTP clock device file.
>> We propose a simple way to use the dynamic clock ID with the ETF Qdisc.
>> We will submit a patch to the "tc" tool from the iproute2 project
>>   once this patch is accepted.
>>
>
> In addition to what Thomas said, I would like to add some thoughts
> (mostly re-wording some of Thomas' comments :-)).
>
> I think that there's an underlying problem/limitation that is the cause
> of the issue (or at least a step in the right direction) you are trying
> to solve: the issue is that PTP clocks can't be used as hrtimers.
>
> I didn't spend a lot of time thinking about how to solve this (the only
> thing that comes to mind is having a timecounter, or similar, "software
> view" over the PHC clock).
>
> Anyway, my feeling is that until this is solved, we would only be
> working around the problem, and creating even more hard to handle corner
> cases.
>
>
> Cheers,
>

You are right.

Thanks for the insight.

Erez
Thomas Gleixner Oct. 3, 2020, 12:10 a.m. UTC | #3
Vinicius,

On Fri, Oct 02 2020 at 12:01, Vinicius Costa Gomes wrote:
> I think that there's an underlying problem/limitation that is the cause
> of the issue (or at least a step in the right direction) you are trying
> to solve: the issue is that PTP clocks can't be used as hrtimers.

That's only an issue if PTP time != CLOCK_TAI, which is insane to begin
with.

As I know that these insanities exists in real world setups, e.g. grand
clock masters which start at the epoch which causes complete disaster
when any of the slave devices booted earlier. Obviously people came
up with system designs which are even more insane.

> I didn't spend a lot of time thinking about how to solve this (the only
> thing that comes to mind is having a timecounter, or similar, "software
> view" over the PHC clock).

There are two aspects:

 1) What's the overall time coordination especially for applications?

    PTP is for a reason based on TAI which allows a universal
    representation of time. Strict monotonic, no time zones, no leap
    seconds, no bells and whistels.

    Using TAI in distributed systems solved a gazillion of hard problems
    in one go.

    TSN depends on PTP and that obviously makes CLOCK_TAI _the_ clock of
    choice for schedules and whatever is needed. It just solves the
    problem nicely and we spent a great amount of time to make
    application development for TSN reasonable and hardware agnostic.

    Now industry comes along and decides to introducde independent time
    universes. The result is a black hole for programmers because they
    now have to waste effort - again - on solving the incredibly hard
    problems of warping space and time.

    The amount of money saved by not having properly coordinated time
    bases in such systems is definitely marginal compared to the amount
    of non-sensical work required to fix it in software.

 2) How can an OS provide halfways usable interfaces to handle this
    trainwreck?

    Access to the various time universes is already available through
    the dynamic POSIX clocks. But these interfaces have been designed
    for the performance insensitive work of PTP daemons and not for the
    performance critical work of applications dealing with real-time
    requirements of all sorts.

    As these raw PTP clocks are hardware dependend and only known at
    boot / device discovery time they cannot be exposed to the kernel
    internaly in any sane way. Also the user space interface has to be
    dynamic which rules out the ability to assign fixed CLOCK_* ids.

    As a consequence these clocks cannot provide timers like the regular
    CLOCK_* variants do, which makes it insanely hard to develop sane
    and portable applications.

    What comes to my mind (without spending much thought on it) is:

       1) Utilize and extend the existing PTP mechanisms to calculate
          the time relationship between the system wide CLOCK_TAI and
          the uncoordinated time universe. As offset is a constant and
          frequency drift is not a high speed problem this can be done
          with a userspace daemon of some sorts.

        2) Provide CLOCK_TAI_PRIVATE which defaults to CLOCK_TAI,
           i.e. offset = 0 and frequency ratio = 1 : 1

        3) (Ab)use the existing time namespace to provide a mechanism to
           adjust the offset and frequency ratio of CLOCK_TAI_PRIVATE
           which is calculated by #1

           This is the really tricky part and comes with severe
           limitations:

             - We can't walk task list to find tasks which have their
               CLOCK_TAI_PRIVATE associated with a particular
               incarnation of PCH/PTP universe, so some sane referencing
               of the underlying parameters to convert TAI to
               TAI_PRIVATE and vice versa has to be found. Life time
               problems are going to be interesting to deal with.

             - An application cannot coordinate multiple PCH/PTP domains
               and has to restrict itself to pick ONE disjunct time
               universe.

               Whether that's a reasonable limitation I don't know
               simply because the information provided in this patch
               series is close to zero.

             - Preventing early timer expiration caused by frequency
               drift is not trivial either.

      TBH, just thinking about all of that makes me shudder and my knee
      jerk reaction is: NO WAY!

Why the heck can't hardware people and system designers finally
understand that time is not something they can define at their
own peril?

The "Let's solve it in software so I don't have to think about it"
design approach strikes again. This caused headaches for the past five
decades, but people obviously never learn.

That said, I'm open for solutions which are at least in the proximity of
sane, but that needs a lot more information about the use cases and the
implications and not just some handwavy 'we screwed up our system design
and therefore we need to inflict insanity on everyone' blurb.

Thanks,

        tglx
Meisinger, Andreas Oct. 9, 2020, 11:17 a.m. UTC | #4
Hello Mr Gleixner,
thanks for your feedback we'll fix the issues not related to the time scale topic as soon as possible.

Regarding your concerns about not using TAI timescale, we do admit that in many situations TAI makes a lot of things way more easy and therefore is the way to go.

Yet we do already have usecases where this can't be done. Additionally a lot of discussions at this topic are ongoing in 60802 profile creation too.
Some of our usecases do require a network which does not depend on any external timesource. This might be due to the network not being connected (to the internet) or just because the network may not be able to rely on or trust an external timesource. Some reasons for this might be safety, security, availability or legal implications ( e.g. if a machine builder has to guarantee operation of a machine which depends on an internal tsn network).

About your question if an application needs to be able to sync to multiple timescales. A small count of usecases even would require multiple independent timesources to be used. At the moment they all seem to be located in the area of extreme high availability. There's ongoing evaluation about this issues and we're not sure if there's a way to do this without special hardware so we didn't address it here.

Additionally to these special cases at least "reading" different timesources should be possible in all cases, e.g. to be able to log based on TAI while network operation relies on it's own clock. Of course TAI timescale wouldn't the same level of trust in this scenario.

Best regards
Andreas Meisinger

Siemens AG
Digital Industries
Process Automation
DI PA DCP TI
Gleiwitzer Str. 555
90475 Nürnberg, Deutschland
Tel.: +49 911 95822720
mailto:andreas.meisinger@siemens.com

www.siemens.com/ingenuityforlife

-----Ursprüngliche Nachricht-----
Von: Thomas Gleixner <tglx@linutronix.de>
Gesendet: Samstag, 3. Oktober 2020 02:10
An: Vinicius Costa Gomes <vinicius.gomes@intel.com>; Geva, Erez (ext) (DI PA CI R&D 3) <erez.geva.ext@siemens.com>; linux-kernel@vger.kernel.org; netdev@vger.kernel.org; Cong Wang <xiyou.wangcong@gmail.com>; David S . Miller <davem@davemloft.net>; Jakub Kicinski <kuba@kernel.org>; Jamal Hadi Salim <jhs@mojatatu.com>; Jiri Pirko <jiri@resnulli.us>; Andrei Vagin <avagin@gmail.com>; Dmitry Safonov <0x7f454c46@gmail.com>; Eric W . Biederman <ebiederm@xmission.com>; Ingo Molnar <mingo@kernel.org>; John Stultz <john.stultz@linaro.org>; Michal Kubecek <mkubecek@suse.cz>; Oleg Nesterov <oleg@redhat.com>; Peter Zijlstra <peterz@infradead.org>; Richard Cochran <richardcochran@gmail.com>; Stephen Boyd <sboyd@kernel.org>; Vladis Dronov <vdronov@redhat.com>; Sebastian Andrzej Siewior <bigeasy@linutronix.de>; Frederic Weisbecker <frederic@kernel.org>; Eric Dumazet <edumazet@google.com>
Cc: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>; Vedang Patel <vedang.patel@intel.com>; Sudler, Simon (DI PA DCP TI) <simon.sudler@siemens.com>; Meisinger, Andreas (DI PA CI R&D 3) <andreas.meisinger@siemens.com>; Bucher, Andreas (DI PA DCP R&D 3) <andreas.bucher@siemens.com>; Schild, Henning (T RDA IOT SES-DE) <henning.schild@siemens.com>; Kiszka, Jan (T RDA IOT SES-DE) <jan.kiszka@siemens.com>; Zirkler, Andreas (T RDA IOT INN-DE) <andreas.zirkler@siemens.com>; Sakic, Ermin (T RDA IOT INN-DE) <ermin.sakic@siemens.com>; Nguyen, An Ninh (DI FA TIP AAT 2) <anninh.nguyen@siemens.com>; Saenger, Michael (DI PA CI R&D 4) <michael.saenger@siemens.com>; Maehringer, Bernd (DI PA CI R&D 4) <bernd.maehringer@siemens.com>; Greinert, Gisela (DI PA CI R&D 4) <gisela.greinert@siemens.com>; Geva, Erez (ext) (DI PA CI R&D 3) <erez.geva.ext@siemens.com>; Erez Geva <ErezGeva2@gmail.com>
Betreff: Re: [PATCH 0/7] TC-ETF support PTP clocks series

Vinicius,

On Fri, Oct 02 2020 at 12:01, Vinicius Costa Gomes wrote:
> I think that there's an underlying problem/limitation that is the
> cause of the issue (or at least a step in the right direction) you are
> trying to solve: the issue is that PTP clocks can't be used as hrtimers.

That's only an issue if PTP time != CLOCK_TAI, which is insane to begin with.

As I know that these insanities exists in real world setups, e.g. grand clock masters which start at the epoch which causes complete disaster when any of the slave devices booted earlier. Obviously people came up with system designs which are even more insane.

> I didn't spend a lot of time thinking about how to solve this (the
> only thing that comes to mind is having a timecounter, or similar,
> "software view" over the PHC clock).

There are two aspects:

 1) What's the overall time coordination especially for applications?

    PTP is for a reason based on TAI which allows a universal
    representation of time. Strict monotonic, no time zones, no leap
    seconds, no bells and whistels.

    Using TAI in distributed systems solved a gazillion of hard problems
    in one go.

    TSN depends on PTP and that obviously makes CLOCK_TAI _the_ clock of
    choice for schedules and whatever is needed. It just solves the
    problem nicely and we spent a great amount of time to make
    application development for TSN reasonable and hardware agnostic.

    Now industry comes along and decides to introducde independent time
    universes. The result is a black hole for programmers because they
    now have to waste effort - again - on solving the incredibly hard
    problems of warping space and time.

    The amount of money saved by not having properly coordinated time
    bases in such systems is definitely marginal compared to the amount
    of non-sensical work required to fix it in software.

 2) How can an OS provide halfways usable interfaces to handle this
    trainwreck?

    Access to the various time universes is already available through
    the dynamic POSIX clocks. But these interfaces have been designed
    for the performance insensitive work of PTP daemons and not for the
    performance critical work of applications dealing with real-time
    requirements of all sorts.

    As these raw PTP clocks are hardware dependend and only known at
    boot / device discovery time they cannot be exposed to the kernel
    internaly in any sane way. Also the user space interface has to be
    dynamic which rules out the ability to assign fixed CLOCK_* ids.

    As a consequence these clocks cannot provide timers like the regular
    CLOCK_* variants do, which makes it insanely hard to develop sane
    and portable applications.

    What comes to my mind (without spending much thought on it) is:

       1) Utilize and extend the existing PTP mechanisms to calculate
          the time relationship between the system wide CLOCK_TAI and
          the uncoordinated time universe. As offset is a constant and
          frequency drift is not a high speed problem this can be done
          with a userspace daemon of some sorts.

        2) Provide CLOCK_TAI_PRIVATE which defaults to CLOCK_TAI,
           i.e. offset = 0 and frequency ratio = 1 : 1

        3) (Ab)use the existing time namespace to provide a mechanism to
           adjust the offset and frequency ratio of CLOCK_TAI_PRIVATE
           which is calculated by #1

           This is the really tricky part and comes with severe
           limitations:

             - We can't walk task list to find tasks which have their
               CLOCK_TAI_PRIVATE associated with a particular
               incarnation of PCH/PTP universe, so some sane referencing
               of the underlying parameters to convert TAI to
               TAI_PRIVATE and vice versa has to be found. Life time
               problems are going to be interesting to deal with.

             - An application cannot coordinate multiple PCH/PTP domains
               and has to restrict itself to pick ONE disjunct time
               universe.

               Whether that's a reasonable limitation I don't know
               simply because the information provided in this patch
               series is close to zero.

             - Preventing early timer expiration caused by frequency
               drift is not trivial either.

      TBH, just thinking about all of that makes me shudder and my knee
      jerk reaction is: NO WAY!

Why the heck can't hardware people and system designers finally understand that time is not something they can define at their own peril?

The "Let's solve it in software so I don't have to think about it"
design approach strikes again. This caused headaches for the past five decades, but people obviously never learn.

That said, I'm open for solutions which are at least in the proximity of sane, but that needs a lot more information about the use cases and the implications and not just some handwavy 'we screwed up our system design and therefore we need to inflict insanity on everyone' blurb.

Thanks,

        tglx
Thomas Gleixner Oct. 9, 2020, 3:39 p.m. UTC | #5
Andreas,

On Fri, Oct 09 2020 at 11:17, Andreas Meisinger wrote:

please do not top-post and trim your replies.

> Yet we do already have usecases where this can't be done. Additionally
> a lot of discussions at this topic are ongoing in 60802 profile
> creation too.  Some of our usecases do require a network which does
> not depend on any external timesource. This might be due to the
> network not being connected (to the internet) or just because the
> network may not be able to rely on or trust an external
> timesource. Some reasons for this might be safety, security,
> availability or legal implications ( e.g. if a machine builder has to
> guarantee operation of a machine which depends on an internal tsn
> network).

I'm aware of the reasons for these kind of setups.

> About your question if an application needs to be able to sync to
> multiple timescales. A small count of usecases even would require
> multiple independent timesources to be used. At the moment they all
> seem to be located in the area of extreme high availability. There's
> ongoing evaluation about this issues and we're not sure if there's a
> way to do this without special hardware so we didn't address it here.

Reading several raw PTP clocks is always possible through the existing
interfaces and if the coordidation between real TAI and the raw PTP
clocks is available, then these interfaces could be extended to provide
time normalized to real TAI.

But that does not allow to utilize the magic clocks for arming timers so
these have to be based on some other clock and the application needs to do
the conversion back and forth.

Now I said that we could abuse time name spaces for providing access to
_one_ magic TAI clock which lets the kernel do that work, but thinking
more about it, it should be possible to do so for all of them even
without name spaces.

The user space daemon which does the correlation between these PTP
domains and TAI is required in any case, so the magic clock TAI_PRIVATE
is not having any advantage.

If that correlation exists then at least clock_nanosleep() should be
doable. So clock_nanosleep(clock PTP/$N) would convert the sleep time to
TAI and queue a timer internally on the CLOCK_TAI base.

Depending on the frequency drift between CLOCK_TAI and clock PTP/$N the
timer expiry might be slightly inaccurate, but surely not more
inaccurate than if that conversion is done purely in user space.

The self rearming posix timers would work too, but the self rearming is
based on CLOCK_TAI, so rounding errors and drift would be accumulative.
So I'd rather stay away from them.

If there is no deamon which manages the correlation then the syscall
would fail.

If such a coordination exists, then the whole problem in the TSN stack
is gone. The core can always operate on TAI and the network device which
runs in a different time universe would use the same conversion data
e.g. to queue a packet for HW based time triggered transmission. Again
subject to slight inaccuracy, but it does not come with all the problems
of dynamic clocks, locking issues etc. As the frequency drift between
PTP domains is neither fast changing nor randomly jumping around the
inaccuracy might even be a mostly academic problem.

Thoughts?

Thanks,

        tglx