Without this information, a time distribution network is a black box. TimeKeeper looks at all the sources of time, in any protocol, that it can see around it, and tracks each source back to its source. These maps, especially in larger and older firms, find all sorts of interesting things. They find “redundant” time sources that are not. They find odd cycles that can cause dumb NTP implementations to oscillate between sources. They can find unexpectedly long legs (edge length corresponds to one way delay). In one corporate network they discovered a GPS source that was, to the surprise of users, failing over to a modem connected to a trans-Atlantic feed. These pictures are a way of creating actionable information from reams of difficult to understand data.
In a distributed compute system, such as any multi-device transaction system or database, time synchronization is essential to data integrity. The simplest case is a multi-step transaction over multiple compute devices – something that is common to a wide range of applications. Consider a financial trading application where machine A gets a tick containing a price change, machine B sends a bid out to an exchange via machine X that does some sanity/safety check, machine D gets a trade conformation of this trade, and machine E reconciles the book. We distribute this computation over 5 machines because we need both the compute and I/O bandwidth (both network and storage i/o) and because the system needs to be able to continue to operate even if machines go down, and because different machines may have different advantages. Without authoritative time stamp, we cannot serialize this single transaction in the record or log. We can’t analyze performance to see where there are bottlenecks. We can’t catch emerging problems before failure. We don’t have a sensible forensic log.
This paper describes an efficient optimistic concurrency control
scheme for use in distributed database systems in which objects are
cached and manipulated at client machines while persistent storage
and transactional support are provided by servers. The scheme
provides both serializability and external consistency for committed
transactions; it uses loosely synchronized clocks to achieve global
serialization. It stores only a single version of each object, and
avoids maintaining any concurrency control information on a per-
object basis; instead, it tracks recent invalidations on a per-client
basis, an approach that has low in-memory space overhead and no
per-object disk overhead. In addition to its low space overheads,
the scheme also performs well. The paper presents a simulation
study that compares the scheme to adaptive callback locking, the
best concurrency control scheme for client-server object-oriented
database systems studied to date. The study shows that our
scheme outperforms adaptive callback locking for low to moderate
contention workloads, and scales better with the number of clients.
For high contention workloads, optimism can result in a high abort
rate; the scheme presented here is a first step toward a hybrid scheme
that we expect to perform well across the full range of workloads.
Synchronizing clocks in the cloud, especially for virtual machines, is way beyond the capabilities of ordinary synchronization methods. Tests show that virtual machines in the cloud relying on NTPd can fall off the reference time by tens of minutes over a single day. Even bare metal cloud platforms are significantly worse than dedicated machines. This creates an interesting dynamic because distributed applications are becoming more and more dependent on tight time synchronization and there are a large number of existing applications that need at least millisecond level synchronization. In the cloud environment, the design limitations of the alternative time protocol, PTP, mean it cannot be relied upon to give provide better performance either. PTP was initially designed to servos and data acquisition devices on a single shared ethernet – weaknesses like dependence on multicast (broadcasting) and top down failover methods are a problem in the general enterprise, but really become impediments in the cloud. That’s why we took a semantic approach, fixing time distribution above the level of the protocol. The low level, bit level, packet level, cannot deliver the sophisticated time analysis, fault recovery, and management needed for complex environments like the cloud.
Some slides on GPS timing vulnerabilities.
Time Synchronization and Distribution is a business critical issue but it is easy to become bogged down in arcane technology/marketing controversies. One of those controversies is over the choice between low level network protocols used to deliver time to application server computers. The most widely used time synchronization protocol in the enterprise is called the Network Time Protocol (NTP) and there is a newer protocol called IEEE 1588 Precision Time Protocol. For most financial trading firms however, the critical questions are not down in the weeds of networks protocols, but in the level of business process: how accurate and reliable time synchronization will be for trading applications and how much alternative solutions cost to implement and to maintain. TimeKeeper time synchronization technology – hardware appliances, software clients, and software servers – is all protocol agnostic and can operate both protocols concurrently so that our customers can make decisions based on business logic instead of on technology details.
Customers and network equipment providers often express surprise (or, sadly, skepticism) when we say that we can provide the same deep sub-microsecond accuracy for NTP that we provide for PTP but it should not be all that surprising. From a network packet technology point of view the two protocols are similar and some of the same hardware support that helps improve PTP performance is also available for NTP. Each protocol has its advantages and disadvantages that manifest differently depending on user requirements and network architecture. In practice, we find that the most demanding high precision trading systems benefit from being able to use both protocols to back up and complement each other.
The perception that NTP is an inferior protocol is due to wide use of equipment that implements NTP badly. Network clocks that run NTP on 30 year old free software designed for keeping desktop PCs synchronized to within seconds, executing on an underpowered Linux embedded computers with network adapters chosen for lowest cost will usually have terrible NTP performance. The poor performance is a result of the quality of the implementation and is not intrinsic to the protocol.
TimeKeeper uses sophisticated algorithmic methods, smoothing and filtering technology, and a highly optimized code base to implement both protocols and our own methods for fault tolerance and error detection. We do not rely on any legacy code for either the server side (which sends out time) or the client side (which consumes time from a time server). TimeKeeper software will use hardware timestamping on both protocols if possible. TimeKeeper GrandMaster appliances use exceptionally precise hardware timestamping for both protocols at 10 Gigabits per second or 1 Gigabit per second. Even using TimeKeeper on just one end of a NTP stream will greatly improve time accuracy (and reliability). Using TimeKeeper on both ends, for example, using a TimeKeeper GPS GrandMaster as server and TimeKeeper Client software on the client computer will dramatically improve performance. Sub-microsecond precision is quite practical (and no tuning or configuration wizardry is needed). And TimeKeeper MultiSource Client software can be setup to consume multiple time sources, some PTP and some NTP.
Because TimeKeeper is protocol agnostic, network managers can radically upgrade time synchronization without major labor and capital expenditures. PTP prefers to be used in a multicast environment – it was designed for simple networks consisting of a single shared Ethernet. NTP was designed for long haul networks. An existing time distribution network based on NTP can be upgraded transparently by adding TimeKeeper client and server software and GrandMasters as needed or even incrementally. PTP can be turned on and off for parts of the network as business logic dictates. Clients can be configured to get a PTP feed from a local PTP GrandMaster source and to also consume an NTP feed from an existing source. Existing NTP Network clocks can even be connected to TimeKeeper servers and converted into high quality PTP/NTP servers. TimeKeeper adapts to the architecture of the existing network and to the demands of business process, it does not disrupt existing networks. TimeKeeper is designed to assist management who are focused on issues such as what level of accuracy, record keeping, and fault tolerance their trading systems need, rather than on low level technology details.
Interesting article – most of these problems and more were solved in TimeKeeper years ago. But the most interesting part is the enormous engineering effort required to kind of get PTP to work.