The following graphs show the fluctuations in the rates of the system clock and of the real time clocks on a variety of computers on the theory network. Until June all were synchronize against the same system, ntp.ubc.ca, a stratum 2 ntp server on campus ( the time delay is on the order of 100s of microseconds to that machine from any of these computers). as the top graph shows that server had a 3-4 msec sawtooth drift against GPS time. Thereafter, string was synchronized against tick.usask.ca, a stratum 1 server synchronized against GPS. In Sept, 2007, string was put onto ntp and sychronized against a stratum 0 GPS clock ( A Garmin 18LV GPS receiver with a PPS output) against which it maintains a roughly 2-3 microsecond offset. All of the other clocks are chrony synchronized against it. It is less a msec via switches away from all of the other clocks.
The following graphs plot the rate of the system clock vs the ntp server (red line and left hand scale) and the rate of the RTC vs the system clock(real time clock-- the CMOS clock)( dotted lines and right hand scale) against the time in days after 00:00 on the date shown. The rates are in units of microseconds per second. These rates are determined by comparing the reading on the system clock with the ntp determined times on the NTP server to adjust the rate of the system clock, and the rate of the RTC vs the system clock. Note that the strong correlation between the rate fluctuations suggests that the system clock is the primary source of noise, and that in general the RTC has better stability than does the system clock.
In the graphs for the week ending Feb 11, the huge instability in the case of one of the machines, info,i and of the other machines after they were restarted on Feb 9, is
unexplained. There seems to be an instability in the operation of chrony.
The restoration of a semblance of order after the 10th was done by
decreasing the maxupdateskew to 1/5 (from unlimited).
Dilaton was the most accurate clock in its rate fluctuations before that
restarting, but not afterwards.
Well, I have finally tracked down the problem. That stratum 2 server
ntp.ubc.ca stinks. I got a gps device with a PPS output, which I hooked up
to a couple of the machines. The most interesting is string, which had some
of the most unstable behaviour with chrony and ntp.ubc.ca. in the following
graph, I have plotted the response of string to the gps clock ( with chriny
switched off) to ntp.ubc.ca and to tick.usask.edu, a stratum 1 server.
The huge regular sawtooth waves come from ntp.ubc.ca. Not only is the
system on average about 3ms fast, its offset varies regularly.
tick.usask.edu is very much better behaved-- considering that it is almost
10 msec away ( peer delay), its accuracy differs from the gps time by only
about a few tens of a microsecond. (The "line" across the top is the gps
time, with a width, a jtter of about 3 microseconds. The jagged line
starting at 24 hr is tick.usask.ca, while the huge oscillation is
ntp.ubc.ca, a supposed stratum 2 source. It may be that because it is
running SunOS, the kernel cannot regulate the system clock properly leading
to this behaviour.
(Note that in each case exactly the same overall drift has been removed
from the data-- ie the drift was determined from teh GPS clock and then the
same drift was removed from each of the other graphs.)
What is interesting is that while the gps spikes are all late ( by a few
microseconds) both the ntp sources are early. This seems to imply that the
outbound ntp packets take slightly longer than the inbound packets.
On Apr 14 all of the machines except dilaton and string were changed to get
their primary time from string, which gets its time from tick.usask.edu.
Dilaton got its time from time-nw.nist.gov, a time server located at
Microsoft but was switched to string on Apr 15.
In August, String was switched to running ntp with a Garmin 18LVC gps
receiver delivering PPS signals to ntp. The accuracy of string then became
of the order of a microsecond.
In Nov, the bottom graphs were added. These give the measured offsets and
round trip delay times for string as the stratum 0 source from each of the machines. The large ( up
to 1 sec) round trip times seem to be due to problems with the switches
installed in Physics (Cisco Gigabit switches) which seem to insert
latencies of up to 2 seconds in routing the ntp packets between the various
machines and string. monopole, charge, gauge, boson, dilaton, flory, info,
fluxon are all on the same set of switches, so the delays come from single
switches.
This is especially obvious in the week ending Feb 18 Some of the machines
have huge (10ppm) fluctuations in the rate, and at exactly the same time,
others (eg charge) are running in the .2 ppm range of fluctuations.
Ie, these fluctutions are not coming from the source ntp.ubc.ca. They seem
to be inherent in the way chrony is setting the rates.
Since the time between comparison of the system clock vs the NTP server is of the order of 100-1000 sec (peer delay is .6ms typically) , the noise rate in the case of the best system would correspond to less than a millisecond drift
|
|
|
|
|
|
|
|
|
|
|
|
inflaton One 450MHz Intel Pentium III Processor, 128M RAM, 903.19 Bogomips Total doublet Two 450MHz Intel Pentium III Processors, 256M RAM, 1805.35 Bogomips Total dilaton One 750MHz Intel Pentium III Processor, 256M RAM, 1498.05 Bogomips Total gauge One 750MHz Intel Pentium III Processor, 256M RAM, 1498.00 Bogomips Total monopole One 750MHz Intel Pentium III Processor, 384M RAM, 1498.05 Bogomips Total charge One 935MHz Intel Pentium III Processor, 384M RAM, 1872.92 Bogomips Total orbit One 935MHz Intel Pentium III Processor, 256M RAM, 1872.86 Bogomips Total string One 1.6GHz Intel Pentium 4 Processor, 512M RAM, 3194.28 Bogomips Total fluxon One 2.67GHz Intel Pentium 4 Processor, 0.99GB RAM, 5339.53 Bogomips Total boson Two 2.8GHz Intel Pentium 4 Processors, 0.98GB RAM, 11179.02 Bogomips Total info Two 3GHz Intel Pentium 4 Processors, 0.99GB RAM, 12008.29 Bogomips Total flory Two 3GHz Intel Intel(R) Pentium(R) D CPU 3.00GHz Processors, 1GB RAM, 12008.64 Bogomips Total
These rate fluctuations do not represent the actual clock accuracy, (in general chrony keeps the clocks to within a millisecond or less) but do represent the stability in the onboard system clock (driven from the bus frequency) and to some extent the real time clock. As chrony works, it measures the real time clock against the system clock, so an unstable system clock would produce an apparently unstable real time clock. In general the RTC seems to be more stable than is the system clock ( the correleated fluctuations in the system and RTC would suggest that a fair amount of the RTC instability comes from the system clock, rather than the RTC itself. )