diff mbox

[RFC,11/13] time sync: generic infrastructure to map between time stamps generated by a clock source and system time

Message ID 1226415447.31699.10.camel@ecld0pohly
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Patrick Ohly Nov. 5, 2008, 9:58 a.m. UTC
Currently only mapping from clock source to system time is implemented.
The interface could have been made more versatile by not depending on a clock source,
but this wasn't done to avoid writing glue code elsewhere.

The method implemented here is the one used and analyzed under the name
"assisted PTP" in the LCI PTP paper:
http://www.linuxclustersinstitute.org/conferences/archive/2008/PDF/Ohly_92221.pdf

Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
---
 include/linux/clocksync.h |  141 +++++++++++++++++++++++++++++++++++++++++++++
 kernel/time/Makefile      |    2 +-
 kernel/time/clocksync.c   |  108 ++++++++++++++++++++++++++++++++++
 3 files changed, 250 insertions(+), 1 deletions(-)
 create mode 100644 include/linux/clocksync.h
 create mode 100644 kernel/time/clocksync.c

Comments

Andi Kleen Nov. 11, 2008, 4:18 p.m. UTC | #1
> +
> +int clocksync_offset(struct clocksync *sync,
> +		s64 *offset,
> +		u64 *hwtstamp)
> +{
> +	u64 starthw = 0, endhw = 0;
> +	struct {
> +		s64 offset;
> +		s64 duration_sys;
> +	} samples[100], 

That should be separately allocated to avoid potential stack overflow.

Also as a style nit there are normally no {} around single line
statements.


-Andi
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick Ohly Nov. 12, 2008, 8:01 a.m. UTC | #2
On Tue, 2008-11-11 at 16:18 +0000, Andi Kleen wrote:
> > +
> > +int clocksync_offset(struct clocksync *sync,
> > +		s64 *offset,
> > +		u64 *hwtstamp)
> > +{
> > +	u64 starthw = 0, endhw = 0;
> > +	struct {
> > +		s64 offset;
> > +		s64 duration_sys;
> > +	} samples[100], 
> 
> That should be separately allocated to avoid potential stack overflow.

Good catch. "make checkstack" also complains about it, but I didn't get
around to fixing it yet.

I'd prefer to allocate a very small array on the stack (10 entries = 160
bytes) and only fall back to dynamic allocation if the user of clocksync
wants more samples.

> Also as a style nit there are normally no {} around single line
> statements.

This is the part of the CodingStyle that I had most trouble adapting to
because a) I wrote a lot of code where the required style explicitly
asked for {} and b) I can think of several reasons for adding them
always and only one for not adding them.

Anyway, I'll try to keep this in mind, but would prefer to not reformat
the patches unless I have to touch them for other reasons.
David Miller Nov. 12, 2008, 10:05 a.m. UTC | #3
From: Patrick Ohly <patrick.ohly@intel.com>
Date: Wed, 5 Nov 2008 10:58:39 +0100

> Currently only mapping from clock source to system time is implemented.
> The interface could have been made more versatile by not depending on a clock source,
> but this wasn't done to avoid writing glue code elsewhere.
> 
> The method implemented here is the one used and analyzed under the name
> "assisted PTP" in the LCI PTP paper:
> http://www.linuxclustersinstitute.org/conferences/archive/2008/PDF/Ohly_92221.pdf
> 
> Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>

Like patch 9, this will need a full review on linux-kernel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Nov. 12, 2008, 10:08 a.m. UTC | #4
From: Patrick Ohly <patrick.ohly@intel.com>
Date: Wed, 12 Nov 2008 09:01:38 +0100

> Anyway, I'll try to keep this in mind, but would prefer to not reformat
> the patches unless I have to touch them for other reasons.

That distracts the eyes of the people reviewing the code, because
such people spend most of their time reading code that conforms
to the proper kernel coding style.

Therefore, please fix up these issues rather than defer them.
What does it take like 5 minutes of your time?  About the same
amount of time it took you to say you would defer it?  Come on...
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick Ohly Nov. 12, 2008, 4:14 p.m. UTC | #5
On Wed, 2008-11-12 at 10:08 +0000, David Miller wrote:
> From: Patrick Ohly <patrick.ohly@intel.com>
> Date: Wed, 12 Nov 2008 09:01:38 +0100
> 
> > Anyway, I'll try to keep this in mind, but would prefer to not reformat
> > the patches unless I have to touch them for other reasons.
> 
> That distracts the eyes of the people reviewing the code, because
> such people spend most of their time reading code that conforms
> to the proper kernel coding style.

You are right of course. I have changed this and also addressed the
other comments. I'll give it a few more days in case that there are
further comments, then resubmit with linux-kernel on CC.

Should I rebase against net-2.6 or net-next-2.6?
Eric Dumazet Nov. 12, 2008, 4:28 p.m. UTC | #6
Patrick Ohly a écrit :
> On Wed, 2008-11-12 at 10:08 +0000, David Miller wrote:
>> From: Patrick Ohly <patrick.ohly@intel.com>
>> Date: Wed, 12 Nov 2008 09:01:38 +0100
>>
>>> Anyway, I'll try to keep this in mind, but would prefer to not reformat
>>> the patches unless I have to touch them for other reasons.
>> That distracts the eyes of the people reviewing the code, because
>> such people spend most of their time reading code that conforms
>> to the proper kernel coding style.
> 
> You are right of course. I have changed this and also addressed the
> other comments. I'll give it a few more days in case that there are
> further comments, then resubmit with linux-kernel on CC.
> 
> Should I rebase against net-2.6 or net-next-2.6?
> 

net-next-2.6 is the tree you want to use for new network developments

net-2.6 is for bug fixes only


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/clocksync.h b/include/linux/clocksync.h
new file mode 100644
index 0000000..a5bec6f
--- /dev/null
+++ b/include/linux/clocksync.h
@@ -0,0 +1,141 @@ 
+/*
+ * Utility code which helps transforming between hardware time stamps
+ * generated by a clocksource and system time. The clocksource is
+ * assumed to return monotonically increasing time (but this code does
+ * its best to compensate if that is not the case) whereas system time
+ * may jump.
+ */
+#ifndef _LINUX_CLOCKSYNC_H
+#define _LINUX_CLOCKSYNC_H
+
+#include <linux/clocksource.h>
+#include <linux/ktime.h>
+
+/**
+ * struct clocksync - stores state and configuration for the two clocks
+ *
+ * Initialize to zero, then set clock, systime, num_samples.
+ *
+ * Transformation between HW time and system time is done with:
+ * HW time transformed = HW time + offset +
+ *                       (HW time - last_update) * skew / CLOCKSYNC_SKEW_RESOLUTION
+ *
+ * @clock:           the source for HW time stamps (%clocksource_read_time)
+ * @systime:         function returning current system time (ktime_get
+ *                   for monotonic time, or ktime_get_real for wall clock)
+ * @num_samples:     number of times that HW time and system time are to
+ *                   be compared when determining their offset
+ * @offset:          (system time - HW time) at the time of the last update
+ * @skew:            average (system time - HW time) / delta HW time *
+ *                   CLOCKSYNC_SKEW_RESOLUTION
+ * @last_update:     last HW time stamp when clock offset was measured
+ */
+struct clocksync {
+	struct clocksource *clock;
+	union ktime (*systime)(void);
+	int num_samples;
+
+	s64 offset;
+	s64 skew;
+	u64 last_update;
+};
+
+/**
+ * CLOCKSYNC_SKEW_RESOLUTION - fixed point arithmetic scale factor for skew
+ *
+ * Usually one would measure skew in ppb (parts per billion, 1e9), but
+ * using a factor of 2 simplifies the math.
+ */
+#define CLOCKSYNC_SKEW_RESOLUTION (((s64)1)<<30)
+
+/**
+ * clocksync_hw2sys - transform HW time stamp into corresponding system time
+ * @sync:             context for clock sync
+ * @hwtstamp:         the result of %clocksource_read_time or
+ *                    %clocksource_cyc2time
+ */
+static inline union ktime clocksync_hw2sys(struct clocksync *sync,
+					u64 hwtstamp)
+{
+	u64 nsec;
+
+	nsec = hwtstamp + sync->offset;
+	nsec += (s64)(hwtstamp - sync->last_update) * sync->skew /
+		CLOCKSYNC_SKEW_RESOLUTION;
+
+	return ns_to_ktime(nsec);
+}
+
+/**
+ * clocksync_offset - measure current (system time - HW time) offset
+ * @sync:             context for clock sync
+ * @offset:           average offset during sample period returned here
+ * @hwtstamp:         average HW time during sample period returned here
+ *
+ * Returns number of samples used. Might be zero (= no result) in the
+ * unlikely case that system time was monotonically decreasing for all
+ * samples (= broken).
+ */
+int clocksync_offset(struct clocksync *sync,
+		s64 *offset,
+		u64 *hwtstamp);
+
+/**
+ * clocksync_update - update offset and skew by measuring current offset
+ * @sync:             context for clock sync
+ * @hwtstamp:         the result of %clocksource_read_time or
+ *                    %clocksource_cyc2time, pass zero to force update
+ *
+ * Updates are only done at most once per second.
+ */
+static inline void clocksync_update(struct clocksync *sync,
+			u64 hwtstamp)
+{
+	s64 offset;
+	u64 average_time;
+
+	if (hwtstamp &&
+		(s64)(hwtstamp - sync->last_update) < NSEC_PER_SEC) {
+		return;
+	}
+
+	if (!clocksync_offset(sync, &offset, &average_time)) {
+		return;
+	}
+
+	printk(KERN_DEBUG
+		"average offset: %lld\n", offset);
+
+	if (!sync->last_update) {
+		sync->last_update = average_time;
+		sync->offset = offset;
+		sync->skew = 0;
+	} else {
+		s64 delta_nsec = average_time - sync->last_update;
+
+		/* avoid division by negative or small deltas */
+		if (delta_nsec >= 10000) {
+			s64 delta_offset_nsec = offset - sync->offset;
+			s64 skew = delta_offset_nsec *
+				CLOCKSYNC_SKEW_RESOLUTION /
+				delta_nsec;
+
+			/**
+			 * Calculate new overall skew as 4/16 the
+			 * old value and 12/16 the new one. This is
+			 * a rather arbitrary tradeoff between
+			 * only using the latest measurement (0/16 and
+			 * 16/16) and even more weight on past measurements.
+			 */
+#define CLOCKSYNC_NEW_SKEW_PER_16 12
+			sync->skew =
+				((16 - CLOCKSYNC_NEW_SKEW_PER_16) * sync->skew +
+					CLOCKSYNC_NEW_SKEW_PER_16 * skew) /
+				16;
+			sync->last_update = average_time;
+			sync->offset = offset;
+		}
+	}
+}
+
+#endif /* _LINUX_CLOCKSYNC_H */
diff --git a/kernel/time/Makefile b/kernel/time/Makefile
index 905b0b5..6279fb0 100644
--- a/kernel/time/Makefile
+++ b/kernel/time/Makefile
@@ -1,4 +1,4 @@ 
-obj-y += timekeeping.o ntp.o clocksource.o jiffies.o timer_list.o
+obj-y += timekeeping.o ntp.o clocksource.o jiffies.o timer_list.o clocksync.o
 
 obj-$(CONFIG_GENERIC_CLOCKEVENTS_BUILD)		+= clockevents.o
 obj-$(CONFIG_GENERIC_CLOCKEVENTS)		+= tick-common.o
diff --git a/kernel/time/clocksync.c b/kernel/time/clocksync.c
new file mode 100644
index 0000000..8942ab5
--- /dev/null
+++ b/kernel/time/clocksync.c
@@ -0,0 +1,108 @@ 
+/*
+ * Utility code which helps transforming between hardware time stamps
+ * generated by a clocksource and system time.
+ *
+ * Copyright (C) 2008 Intel, Patrick Ohly (patrick.ohly@intel.com)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <linux/clocksync.h>
+#include <linux/module.h>
+
+int clocksync_offset(struct clocksync *sync,
+		s64 *offset,
+		u64 *hwtstamp)
+{
+	u64 starthw = 0, endhw = 0;
+	struct {
+		s64 offset;
+		s64 duration_sys;
+	} samples[100], sample;
+	int counter = 0, i;
+	int used;
+	int index;
+	int num_samples = sync->num_samples;
+
+	if (num_samples > sizeof(samples)/sizeof(samples[0])) {
+		num_samples = sizeof(samples)/sizeof(samples[0]);
+	}
+
+	/* run until we have enough valid samples, but do not try forever */
+	i = 0;
+	counter = 0;
+	while (1) {
+		u64 ts;
+		union ktime start, end;
+
+		start = sync->systime();
+		ts = clocksource_read_time(sync->clock);
+		end = sync->systime();
+
+		if (!i) {
+			starthw = ts;
+		}
+
+		/* ignore negative durations */
+		sample.duration_sys = ktime_to_ns(ktime_sub(end, start));
+		if (sample.duration_sys >= 0) {
+			/*
+			 * assume symetric delay to and from HW: average system time
+			 * corresponds to measured HW time
+			 */
+			sample.offset = ktime_to_ns(ktime_add(end, start)) / 2 -
+				ts;
+
+			/* simple insertion sort based on duration */
+			index = counter - 1;
+			while (index >= 0) {
+				if(samples[index].duration_sys < sample.duration_sys) {
+					break;
+				}
+				samples[index + 1] = samples[index];
+				index--;
+			}
+			samples[index + 1] = sample;
+			counter++;
+		}
+
+		i++;
+		if (counter >= num_samples || i >= 100000) {
+			endhw = ts;
+			break;
+		}
+	}
+
+	*hwtstamp = (endhw + starthw) / 2;
+
+	/* remove outliers by only using 75% of the samples */
+	used = counter * 3 / 4;
+	if (!used) {
+		used = counter;
+	}
+	if (used) {
+		/* calculate average */
+		s64 off = 0;
+		for (index = 0; index < used; index++) {
+			off += samples[index].offset;
+		}
+		off /= used;
+		*offset = off;
+	}
+
+	return used;
+}
+
+EXPORT_SYMBOL_GPL(clocksync_offset);