diff mbox

[RFC,08/11] clocksource: allow usage independent of timekeeping.c

Message ID 1227096528-24150-9-git-send-email-patrick.ohly@intel.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Patrick Ohly Nov. 19, 2008, 12:08 p.m. UTC
So far struct clocksource acted as the interface between time/timekeeping
and hardware. This patch generalizes the concept so that the same
interface can also be used in other contexts.

The only change as far as kernel/time/timekeeping is concerned is that
the hardware access can be done either with or without passing
the clocksource pointer as context. This is necessary in those
cases when there is more than one instance of the hardware.

The extensions in this patch add code which turns the raw cycle count
provided by hardware into a continously increasing time value. This
reuses fields also used by timekeeping.c. Because of slightly different
semantic (__get_nsec_offset does not update cycle_last, clocksource_read_ns
does that transparently) timekeeping.c was not modified to use the
generalized code.

The new code does no locking of the clocksource. This is the responsibility
of the caller.
---
 include/linux/clocksource.h |  119 ++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 118 insertions(+), 1 deletions(-)

Comments

john stultz Dec. 5, 2008, 9:05 p.m. UTC | #1
On Wed, Nov 19, 2008 at 4:08 AM, Patrick Ohly <patrick.ohly@intel.com> wrote:
> So far struct clocksource acted as the interface between time/timekeeping
> and hardware. This patch generalizes the concept so that the same
> interface can also be used in other contexts.

Hey Patrick,
   Sorry for not noticing this thread earlier!

> The only change as far as kernel/time/timekeeping is concerned is that
> the hardware access can be done either with or without passing
> the clocksource pointer as context. This is necessary in those
> cases when there is more than one instance of the hardware.

So as a heads up, the bit about passing the clocksource to the
read_clock() function looks very similar to a bit of what Magnus Damm
was recently working on.

> The extensions in this patch add code which turns the raw cycle count
> provided by hardware into a continously increasing time value. This
> reuses fields also used by timekeeping.c. Because of slightly different
> semantic (__get_nsec_offset does not update cycle_last, clocksource_read_ns
> does that transparently) timekeeping.c was not modified to use the
> generalized code.

Hrm.. I'm a little wary here. Your patch basically creates new
semantics to how the clocksource structure is used,  which will likely
cause confusion.  I'll agree that the clocksource structure has been
somewhat more cluttered with timekeeping-isms then I'd prefer, so
maybe your patches give us the need to clean it up and better separate
the hardware clocksource accessor information and the timekeeping
state.

So to be clear, let me see if I understand your needs from your patch:

1) Need an interface to a counter, who's interface monotonically increases.
2) Need to translate the counter to nanoseconds and nanoseconds back
to the counter
3) The counter will likely not be registered for use in timekeeping
4) The counter's sense of time will not be steered via frequency adjustments.

Is that about the right set of assumptions?

So if we break the clocksource structure into two portions (ignore
name deatils for now)

 strucut counter{
      char* name,
      u32 mult,
      u32 shift,
      cycle_t mask,
      cycle_t (*read)(struct counter*);
      cycle_t (*vread)(void);

     /* bits needed here for real monotonic interface, more on that below */

    /* other arch specific needs */
}

struct timeclock {
      struct counter* counter,
      u32 adjusted_mult,
      cycle_t cycle_last,
      u32 flags;
      u64 xtime_nsec;
      s64 error;
      /* other timekeeping bits go here */
}

So given that, do you think you'd be ok with using just the first
counter structure?

Now there's sort of larger problem I've glossed over. Specifically in
assumption #1 up there. The bit about the interface to the monotonic
counter. Now many hardware counters wrap, and some wrap fairly
quickly. This means we need to have some sort of infrastructure to
periodically accumulate cycles into some "cycle store" value. As long
as the cycle store is 64bits wide, we probably don't have to worry
about overflows  (if I recall 64bits at 1GHZ gives us ~500 years).

Now, currently the timekeeping core does this for the active in-use
clocksource.  However, if we have a number of counter structs that are
being used in different contexts, maybe three registered for
timekeeping, and a few more for different types of timestamping (maybe
audio, networking, maybe even performance counters?), we suddenly have
to do the accumulation step on a quite a few counters to avoid
wrapping.

You dodged this accumulation infrastructure in your patch, by just
accumulating at read time. This works, as long as you can guarantee
that read events occur more often then the wrap frequency. And in most
cases that's probably not too hard, but with some in-developement
work, like the -rt patches, kernel work (even most interrupt
processing) can be deferred by high priority tasks for an unlimited
amount of time.

So this requires thinking this through maybe a bit more, trying to
figure out how to create a guaranteed accumulation frequency, but only
do so on counters that are really actively in use (we don't want to
accumulate on counters that no one cares about). Its probably not too
much work, but we may want to consider other approaches as well.

Another issue that multiple clocksources can cause, is dealing with
time intervals between clocksources. Different clocksources  may be
driven by different crystals, so they will drift apart. Also since the
clocksource used for timekeeping is adjusted by adjtimex(), you'll
likely have to deal with small differences in system time intervals
and clocksource time intervals.

I see you've maybe tried to address some of this with the following
time_sync patch, but I'm not sure I've totally grokked that patch yet.


Anyway, sorry to ramble on so much.  I'm really interested in your
work, its really interesting! But we might want to make sure the right
changes are being made to the right place so we don't get too much
confusion with the same variables meaning different things in
different contexts.

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick Ohly Dec. 11, 2008, 12:11 p.m. UTC | #2
On Fri, 2008-12-05 at 21:05 +0000, john stultz wrote:
> On Wed, Nov 19, 2008 at 4:08 AM, Patrick Ohly <patrick.ohly@intel.com> wrote:
> > So far struct clocksource acted as the interface between time/timekeeping
> > and hardware. This patch generalizes the concept so that the same
> > interface can also be used in other contexts.
> 
> Hey Patrick,
>    Sorry for not noticing this thread earlier!

No problem, it's not holding up anything. The question of how to extend
skb hasn't been settled either. Thanks for taking the time to consider
it.

> > The extensions in this patch add code which turns the raw cycle count
> > provided by hardware into a continously increasing time value. This
> > reuses fields also used by timekeeping.c. Because of slightly different
> > semantic (__get_nsec_offset does not update cycle_last, clocksource_read_ns
> > does that transparently) timekeeping.c was not modified to use the
> > generalized code.
> 
> Hrm.. I'm a little wary here. Your patch basically creates new
> semantics to how the clocksource structure is used,  which will likely
> cause confusion.

That's true. I could keep the code separate, if that helps. I just
didn't want to duplicate the whole structure definition.

>   I'll agree that the clocksource structure has been
> somewhat more cluttered with timekeeping-isms then I'd prefer, so
> maybe your patches give us the need to clean it up and better separate
> the hardware clocksource accessor information and the timekeeping
> state.
> 
> So to be clear, let me see if I understand your needs from your patch:
> 
> 1) Need an interface to a counter, who's interface monotonically increases.
> 2) Need to translate the counter to nanoseconds and nanoseconds back
> to the counter

There are two additional ways of using the counter:
* Get nanosecond delay measurements (clocksource_read_ns). Calling this
  "resets" the counter.
* Get a continously increasing timer value 
  (clocksource_init_time/clocksource_read_time). The clock is only reset
  when calling clocksource_init_time().

The two are mutually exclusive because clocksource_read_time() depends
on clocksource_read_ns(). If this is too confusing, then
clocksource_read_ns() could be turned into an internal helper function.
I left it in the header because there might be other uses for it. The
rest of the patches only needs clocksource_read_time().

Nanoseconds never have to be converted back to the counter. That
wouldn't be possible anyway (hardware counter might roll over, whereas
the clock counts nanoseconds in a 64 bit value and thus will last longer
than the hardware it runs on).

> 3) The counter will likely not be registered for use in timekeeping
> 4) The counter's sense of time will not be steered via frequency adjustments.
> 
> Is that about the right set of assumptions?

About right ;-)

> So if we break the clocksource structure into two portions (ignore
> name deatils for now)
> 
>  strucut counter{
>       char* name,
>       u32 mult,
>       u32 shift,
>       cycle_t mask,
>       cycle_t (*read)(struct counter*);
>       cycle_t (*vread)(void);
> 
>      /* bits needed here for real monotonic interface, more on that below */
> 
>     /* other arch specific needs */
> }
> 
> struct timeclock {
>       struct counter* counter,
>       u32 adjusted_mult,
>       cycle_t cycle_last,
>       u32 flags;
>       u64 xtime_nsec;
>       s64 error;
>       /* other timekeeping bits go here */
> }
> 
> So given that, do you think you'd be ok with using just the first
> counter structure?

Some additional members must be moved to struct counter:
* cycle_last (for the overflow handling)
* xtime_nsec (for the continously increasing timer)

Except from those the first struct is okay.

> Now there's sort of larger problem I've glossed over. Specifically in
> assumption #1 up there. The bit about the interface to the monotonic
> counter. Now many hardware counters wrap, and some wrap fairly
> quickly.
[...]
> You dodged this accumulation infrastructure in your patch, by just
> accumulating at read time. This works, as long as you can guarantee
> that read events occur more often then the wrap frequency.

Exactly. My plan was that the user of such a custom clocksource is
responsible for querying it often enough so that clocksource_read_ns()
can detect the wrap around. This works in the context of PTP (which
causes regular events). Network driver developers must be a bit careful
when there is no active PTP daemon: either they reinitialize the timer
when it starts to get used or probe it automatically after certain
delays.

>  And in most
> cases that's probably not too hard, but with some in-developement
> work, like the -rt patches, kernel work (even most interrupt
> processing) can be deferred by high priority tasks for an unlimited
> amount of time.

I'm not sure what can be done in such a case. Use decent hardware which
doesn't wrap around so quickly, I guess. It's not an issue with the
Intel NIC (sorry for the advertising... ;-)

> Another issue that multiple clocksources can cause, is dealing with
> time intervals between clocksources. Different clocksources  may be
> driven by different crystals, so they will drift apart. Also since the
> clocksource used for timekeeping is adjusted by adjtimex(), you'll
> likely have to deal with small differences in system time intervals
> and clocksource time intervals.
> 
> I see you've maybe tried to address some of this with the following
> time_sync patch, but I'm not sure I've totally grokked that patch yet.

The clocksource API extension and the time sync code are independent at
the moment: the time sync code assumes that it gets two, usually
increasing timer values and tries to match them by measuring skew and
drift between them. If the timer values jump, then the sync code adapts
these values accordingly.

I don't think it will be necessary to add something like adjtimex() to a
clocksource. Either the hardware supports it natively (like the Intel
NIC does, sorry again), or the current time sync deals with frequency
changes by adapting the drift factor.

> Anyway, sorry to ramble on so much.  I'm really interested in your
> work, its really interesting! But we might want to make sure the right
> changes are being made to the right place so we don't get too much
> confusion with the same variables meaning different things in
> different contexts.

Thanks for your comments. I agree that splitting the structures would
help. But the variables really have the same meaning. They are just used
in different functions.
john stultz Dec. 11, 2008, 10:23 p.m. UTC | #3
On Thu, 2008-12-11 at 13:11 +0100, Patrick Ohly wrote:
> On Fri, 2008-12-05 at 21:05 +0000, john stultz wrote:
> > On Wed, Nov 19, 2008 at 4:08 AM, Patrick Ohly <patrick.ohly@intel.com> wrote:
> > > The extensions in this patch add code which turns the raw cycle count
> > > provided by hardware into a continously increasing time value. This
> > > reuses fields also used by timekeeping.c. Because of slightly different
> > > semantic (__get_nsec_offset does not update cycle_last, clocksource_read_ns
> > > does that transparently) timekeeping.c was not modified to use the
> > > generalized code.
> > 
> > Hrm.. I'm a little wary here. Your patch basically creates new
> > semantics to how the clocksource structure is used,  which will likely
> > cause confusion.
> 
> That's true. I could keep the code separate, if that helps. I just
> didn't want to duplicate the whole structure definition.

I think either keeping it separate, using your own structure, or
properly splitting out the  counter / time-clock interface would be the
way to go.

> >   I'll agree that the clocksource structure has been
> > somewhat more cluttered with timekeeping-isms then I'd prefer, so
> > maybe your patches give us the need to clean it up and better separate
> > the hardware clocksource accessor information and the timekeeping
> > state.
> > 
> > So to be clear, let me see if I understand your needs from your patch:
> > 
> > 1) Need an interface to a counter, who's interface monotonically increases.
> > 2) Need to translate the counter to nanoseconds and nanoseconds back
> > to the counter
> 
> There are two additional ways of using the counter:
> * Get nanosecond delay measurements (clocksource_read_ns). Calling this
>   "resets" the counter.

Just so I understand, do you mean clocksource_read_ns() returns the
number of nanoseconds since the last call to clocksource_read_ns() ?

That seems like an odd interface to define, since effectively you're
storing state inside the interface. 

Why exactly is this useful, as opposed to creating a monotonically
increasing function which can be sampled and the state is managed by the
users of the interface?


> * Get a continously increasing timer value 
>   (clocksource_init_time/clocksource_read_time). The clock is only reset
>   when calling clocksource_init_time().

So a monotonic 64bit wide counter. Close to what I described above. Is
there actually a  need for it to reset ever?


> The two are mutually exclusive because clocksource_read_time() depends
> on clocksource_read_ns(). If this is too confusing, then
> clocksource_read_ns() could be turned into an internal helper function.
> I left it in the header because there might be other uses for it. The
> rest of the patches only needs clocksource_read_time().

Yea. It seems like an odd interface, as the internal state seems to
limit its use.

> Nanoseconds never have to be converted back to the counter. That
> wouldn't be possible anyway (hardware counter might roll over, whereas
> the clock counts nanoseconds in a 64 bit value and thus will last longer
> than the hardware it runs on).

Right, but if its a monotonically increasing 64bit counter, rollover
isn't likely an issue. I think we're basically communicating the same
idea here, just the question is do you want to make the interface
provide nanoseconds or cycles. 


> > 3) The counter will likely not be registered for use in timekeeping
> > 4) The counter's sense of time will not be steered via frequency adjustments.
> > 
> > Is that about the right set of assumptions?
> 
> About right ;-)
> 
> > So if we break the clocksource structure into two portions (ignore
> > name deatils for now)
> > 
> >  strucut counter{
> >       char* name,
> >       u32 mult,
> >       u32 shift,
> >       cycle_t mask,
> >       cycle_t (*read)(struct counter*);
> >       cycle_t (*vread)(void);
> > 
> >      /* bits needed here for real monotonic interface, more on that below */
> > 
> >     /* other arch specific needs */
> > }
> > 
> > struct timeclock {
> >       struct counter* counter,
> >       u32 adjusted_mult,
> >       cycle_t cycle_last,
> >       u32 flags;
> >       u64 xtime_nsec;
> >       s64 error;
> >       /* other timekeeping bits go here */
> > }
> > 
> > So given that, do you think you'd be ok with using just the first
> > counter structure?
> 
> Some additional members must be moved to struct counter:
> * cycle_last (for the overflow handling)
> * xtime_nsec (for the continously increasing timer)

Hmm. I'd still prefer those values to be stored elsewhere. As you add
state to the structure, that limits how the structure can be used. For
instance, if cycles_last and xtime_nsec are in the counter structure,
then that means one counter could not be used for both timekeeping and
the hardware time-stamping you're doing.

Instead that state should be stored in the timekeeping and timestamping
structures respectively.

> Except from those the first struct is okay.
> 
> > Now there's sort of larger problem I've glossed over. Specifically in
> > assumption #1 up there. The bit about the interface to the monotonic
> > counter. Now many hardware counters wrap, and some wrap fairly
> > quickly.
> [...]
> > You dodged this accumulation infrastructure in your patch, by just
> > accumulating at read time. This works, as long as you can guarantee
> > that read events occur more often then the wrap frequency.
> 
> Exactly. My plan was that the user of such a custom clocksource is
> responsible for querying it often enough so that clocksource_read_ns()
> can detect the wrap around.

Right, however my point quoted below was that this will likely break in
the -rt kernel, since those users may be deferred for a undefined amount
of time. So we'll need to do something here.


> >  And in most
> > cases that's probably not too hard, but with some in-developement
> > work, like the -rt patches, kernel work (even most interrupt
> > processing) can be deferred by high priority tasks for an unlimited
> > amount of time.
> 
> I'm not sure what can be done in such a case. Use decent hardware which
> doesn't wrap around so quickly, I guess. It's not an issue with the
> Intel NIC (sorry for the advertising... ;-)

Well, I think it would be good to create a infrastructure that will work
on most hardware. 

And I think it can work, but in order to make it work cleanly, we'll
have to have some form of accumulation infrastructure, which will not be
able to be deferred.

However, some careful thought will be needed here, so that we don't
create latencies by wasting time sampling unused hardware counters in
the hardirq context.


> > Another issue that multiple clocksources can cause, is dealing with
> > time intervals between clocksources. Different clocksources  may be
> > driven by different crystals, so they will drift apart. Also since the
> > clocksource used for timekeeping is adjusted by adjtimex(), you'll
> > likely have to deal with small differences in system time intervals
> > and clocksource time intervals.
> > 
> > I see you've maybe tried to address some of this with the following
> > time_sync patch, but I'm not sure I've totally grokked that patch yet.
> 
> The clocksource API extension and the time sync code are independent at
> the moment: the time sync code assumes that it gets two, usually
> increasing timer values and tries to match them by measuring skew and
> drift between them. If the timer values jump, then the sync code adapts
> these values accordingly.

Ok. I'll have to spend some more time on that patch, but it sounds like
you're handling the issue.


> > Anyway, sorry to ramble on so much.  I'm really interested in your
> > work, its really interesting! But we might want to make sure the right
> > changes are being made to the right place so we don't get too much
> > confusion with the same variables meaning different things in
> > different contexts.
> 
> Thanks for your comments. I agree that splitting the structures would
> help. But the variables really have the same meaning. They are just used
> in different functions.

Err, you might be misunderstanding their current meaning. However, its
not your fault, as the naming is not as clear as I like.  For instance,
xtime_nsec stores the sub-nanoseconds (shifted up by clocksource->shift)
not represented in the xtime value.

So yes, while you likely want to keep similar state as the timekeeping
core does, I really think splitting it out fully is going to be the way
to go.

Thanks for the consideration of my comments! I look forward to your
future patches!
-john


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick Ohly Dec. 12, 2008, 8:50 a.m. UTC | #4
On Thu, 2008-12-11 at 22:23 +0000, john stultz wrote:
> On Thu, 2008-12-11 at 13:11 +0100, Patrick Ohly wrote:
> > That's true. I could keep the code separate, if that helps. I just
> > didn't want to duplicate the whole structure definition.
> 
> I think either keeping it separate, using your own structure, or
> properly splitting out the  counter / time-clock interface would be the
> way to go.

Okay, will do that. I'll try to do it so that later the clocksource can
be rewritten so that it uses the same definition.

> > There are two additional ways of using the counter:
> > * Get nanosecond delay measurements (clocksource_read_ns). Calling this
> >   "resets" the counter.
> 
> Just so I understand, do you mean clocksource_read_ns() returns the
> number of nanoseconds since the last call to clocksource_read_ns()?

Yes.

> Why exactly is this useful, as opposed to creating a monotonically
> increasing function which can be sampled and the state is managed by the
> users of the interface?

The monotonically increasing function already is based on a stateful
function which calculcates the delta; calculating the original delta
based on a derived value didn't seem right. But I don't really care much
about this part of the API, so I'll just make it internal.

> > * Get a continously increasing timer value 
> >   (clocksource_init_time/clocksource_read_time). The clock is only reset
> >   when calling clocksource_init_time().
> 
> So a monotonic 64bit wide counter. Close to what I described above. Is
> there actually a  need for it to reset ever?

Perhaps. A device might decide to reset the time each time hardware time
stamping is activated.

> > Some additional members must be moved to struct counter:
> > * cycle_last (for the overflow handling)
> > * xtime_nsec (for the continously increasing timer)
> 
> Hmm. I'd still prefer those values to be stored elsewhere. As you add
> state to the structure, that limits how the structure can be used. For
> instance, if cycles_last and xtime_nsec are in the counter structure,
> then that means one counter could not be used for both timekeeping and
> the hardware time-stamping you're doing.

The clean solution would be
* struct cyclecounter: abstract API to access hardware cycle counter
  The cycle counter may roll over relatively quickly. The implementor
  needs to provide information about the width of the counter and its
  frequency.
* struct timecounter: turns cycles from one cyclecounter into a 
  nanosecond count
  Must detect and deal with cycle counter overflows. Uses a 64 bit
  counter for time, so it itself doesn't overflow (unless we build
  hardware that runs for a *really* long time).

Now, should struct timecounter contain a struct cyclecounter or a
pointer to it? A pointer is more flexible, but overkill for the usage I
had in mind. I'll use a pointer anyway, just in case.

> Instead that state should be stored in the timekeeping and timestamping
> structures respectively.

I'm not sure whether timestamping can be separated from timekeeping: it
depends on the same cycle counter state as the timekeeping.

> > > You dodged this accumulation infrastructure in your patch, by just
> > > accumulating at read time. This works, as long as you can guarantee
> > > that read events occur more often then the wrap frequency.
> > 
> > Exactly. My plan was that the user of such a custom clocksource is
> > responsible for querying it often enough so that clocksource_read_ns()
> > can detect the wrap around.
> 
> Right, however my point quoted below was that this will likely break in
> the -rt kernel, since those users may be deferred for a undefined amount
> of time. So we'll need to do something here.

If the code isn't called often enough to deal with the regular PTP Sync
messages (sent every two seconds), then such a system would already have
quite a few other problems.

> > >  And in most
> > > cases that's probably not too hard, but with some in-developement
> > > work, like the -rt patches, kernel work (even most interrupt
> > > processing) can be deferred by high priority tasks for an unlimited
> > > amount of time.
> > 
> > I'm not sure what can be done in such a case. Use decent hardware which
> > doesn't wrap around so quickly, I guess. It's not an issue with the
> > Intel NIC (sorry for the advertising... ;-)
> 
> Well, I think it would be good to create a infrastructure that will work
> on most hardware.

Most hardware doesn't have hardware time stamping. Is there any hardware
which has hardware time stamping, but only with such a limited counter
that we run into this problem?

I agree that this problem needs to be taken into account now (while
designing these data structures) and be addressed as soon as it becomes
necessary - but not sooner. Otherwise we might end up with dead code
that isn't used at all.

> And I think it can work, but in order to make it work cleanly, we'll
> have to have some form of accumulation infrastructure, which will not be
> able to be deferred.
> 
> However, some careful thought will be needed here, so that we don't
> create latencies by wasting time sampling unused hardware counters in
> the hardirq context.

Currently the structures are owned by the device driver which owns the
hardware. Perhaps the device driver could register the structure with
such an accumulation infrastructure if the driver itself cannot
guarantee that it will check the cycle counter often enough. Concurrent
access to the cycle counter hardware and state could make this tricky.

This goes into areas where I have no experience at all, so I would
depend on others to provide that code.
diff mbox

Patch

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index f88d32f..5435bd2 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -24,6 +24,9 @@  struct clocksource;
 /**
  * struct clocksource - hardware abstraction for a free running counter
  *	Provides mostly state-free accessors to the underlying hardware.
+ *      Also provides utility functions which convert the underlying
+ *      hardware cycle values into a non-decreasing count of nanoseconds
+ *      ("time").
  *
  * @name:		ptr to clocksource name
  * @list:		list head for registration
@@ -43,6 +46,9 @@  struct clocksource;
  *				The ideal clocksource. A must-use where
  *				available.
  * @read:		returns a cycle value
+ * @read_clock:         alternative to read which gets a pointer to the clock
+ *                      source so that the same code can read different clocks;
+ *                      either read or read_clock must be set
  * @mask:		bitmask for two's complement
  *			subtraction of non 64 bit counters
  * @mult:		cycle to nanosecond multiplier (adjusted by NTP)
@@ -62,6 +68,7 @@  struct clocksource {
 	struct list_head list;
 	int rating;
 	cycle_t (*read)(void);
+	cycle_t (*read_clock)(struct clocksource *cs);
 	cycle_t mask;
 	u32 mult;
 	u32 mult_orig;
@@ -170,7 +177,7 @@  static inline u32 clocksource_hz2mult(u32 hz, u32 shift_constant)
  */
 static inline cycle_t clocksource_read(struct clocksource *cs)
 {
-	return cs->read();
+	return (cs->read ? cs->read() : cs->read_clock(cs));
 }
 
 /**
@@ -190,6 +197,116 @@  static inline s64 cyc2ns(struct clocksource *cs, cycle_t cycles)
 }
 
 /**
+ * clocksource_read_ns - get nanoseconds since last call of this function
+ *                       (never negative)
+ * @cs:         Pointer to clocksource
+ *
+ * When the underlying cycle counter runs over, this will be handled
+ * correctly as long as it does not run over more than once between
+ * calls.
+ *
+ * The first call to this function for a new clock source initializes
+ * the time tracking and returns bogus results.
+ */
+static inline s64 clocksource_read_ns(struct clocksource *cs)
+{
+	cycle_t cycle_now, cycle_delta;
+	s64 ns_offset;
+
+	/* read clocksource: */
+	cycle_now = clocksource_read(cs);
+
+	/* calculate the delta since the last clocksource_read_ns: */
+	cycle_delta = (cycle_now - cs->cycle_last) & cs->mask;
+
+	/* convert to nanoseconds: */
+	ns_offset = cyc2ns(cs, cycle_delta);
+
+	/* update time stamp of clocksource_read_ns call: */
+	cs->cycle_last = cycle_now;
+
+	return ns_offset;
+}
+
+/**
+ * clocksource_init_time - initialize a clock source for use with
+ *                         %clocksource_read_time() and
+ *                         %clocksource_cyc2time()
+ * @cs:            Pointer to clocksource.
+ * @start_tstamp:  Arbitrary initial time stamp.
+ *
+ * After this call the current cycle register (roughly) corresponds to
+ * the initial time stamp. Every call to %clocksource_read_time()
+ * increments the time stamp counter by the number of elapsed
+ * nanoseconds.
+ */
+static inline void clocksource_init_time(struct clocksource *cs,
+					u64 start_tstamp)
+{
+	cs->cycle_last = clocksource_read(cs);
+	cs->xtime_nsec = start_tstamp;
+}
+
+/**
+ * clocksource_read_time - return nanoseconds since %clocksource_init_time()
+ *                         plus the initial time stamp
+ * @cs:          Pointer to clocksource.
+ *
+ * In other words, keeps track of time since the same epoch as
+ * the function which generated the initial time stamp. Don't mix
+ * with calls to %clocksource_read_ns()!
+ */
+static inline u64 clocksource_read_time(struct clocksource *cs)
+{
+	u64 nsec;
+
+	/* increment time by nanoseconds since last call */
+	nsec = clocksource_read_ns(cs);
+	nsec += cs->xtime_nsec;
+	cs->xtime_nsec = nsec;
+
+	return nsec;
+}
+
+/**
+ * clocksource_cyc2time - convert an absolute cycle time stamp to same
+ *                        time base as values returned by
+ *                        %clocksource_read_time()
+ * @cs:            Pointer to clocksource.
+ * @cycle_tstamp:  a value returned by cs->read()
+ *
+ * Cycle time stamps that are converted correctly as long as they
+ * fall into the time interval [-1/2 max cycle count, 1/2 cycle count],
+ * with "max cycle count" == cs->mask+1.
+ *
+ * This avoids situations where a cycle time stamp is generated, the
+ * current cycle counter is updated, and then when transforming the
+ * time stamp the value is treated as if it was in the future. Always
+ * updating the cycle counter would also work, but incurr additional
+ * overhead.
+ */
+static inline u64 clocksource_cyc2time(struct clocksource *cs,
+				cycle_t cycle_tstamp)
+{
+	u64 cycle_delta = (cycle_tstamp - cs->cycle_last) & cs->mask;
+	u64 nsec;
+
+	/*
+	 * Instead of always treating cycle_tstamp as more recent
+	 * than cs->cycle_last, detect when it is too far in the
+	 * future and treat it as old time stamp instead.
+	 */
+	if (cycle_delta > cs->mask / 2) {
+		cycle_delta = (cs->cycle_last - cycle_tstamp) & cs->mask;
+		nsec = cs->xtime_nsec - cyc2ns(cs, cycle_delta);
+	} else {
+		nsec = cyc2ns(cs, cycle_delta) + cs->xtime_nsec;
+	}
+
+	return nsec;
+}
+
+/**
  * clocksource_calculate_interval - Calculates a clocksource interval struct
  *
  * @c:		Pointer to clocksource.