The first time I saw this method was when I went to work for Parallel Computer Systems, , later called Auragen, in the famous tech startup center of Englewood Cliffs, New Jersey. I commuted there from the East Village. (True story: I applied for the job after finding an advert in a discarded copy of the NY Times on the floor of a Brooklyn apartment while visiting friends. I sent via US mail a resume typed on a manual typewriter- I’m tempted to say “composed by the light of a tallow candle” but that would be over the top- and forgot to send the second page. )
The company built a parallel computer based on Motorola 68000s with a replicated message bus. The bus guaranteed message delivery to 3 destinations would either succeed to all three or fail to all three. This property is called “reliable broadcast”. All interprocess communication was by message transfer (a fashionable idea at the time). Each process had a backup. Whenever a primary process sent a message, the message was also delivered to the backup and to the destination backup. If the primary failed, the backup could be run. The backup would have a queue of messages received by the primary and a count of messages sent by the primary. When the recovering backup tried to transmit a message, if the count was greater than zero, the count would be decremented and the message discarded because it has already been transmitted by the old primary. When the recovering secondary did a receive operation, if there was a message on the input queue, it would get that message. In this way, the recovering backup would repeat the operations of the primary until it caught up. As an optimization, the primary could be periodically checkpointed and queues of duplicated messages could be discarded.
The operating system was an implementation of UNIX. In practice, it was discovered that making each UNIX system call into a message exchange, which was an idea advocated in the OS research community at the time, caused serious performance problems. The replicated state machine operation depended on this design in order to make the state machine operation deterministic. Suppose the primary requested, for example, the time and then made a decision based on the time. A recovering secondary would need exactly the same time to guarantee that it produced the same results as the primary. So every interaction between application and OS needed to be recorded in a message exchange. But a message exchange is nowhere near as fast as a system call (unless the OS developers are horrible).
The performance issue was mitigated by some clever engineering, but was a problem that was discovered in parallel by a number of development teams working on distributed OS designs and micro-kernels which were in vogue at the time. Execution of “ls -l” was particularly interesting.
Anyways, here’s the description from the patent.
To accomplish this object, the invention contemplates that instead of keeping the backup or secondary task exactly up to date, the backup is kept nearly up to date but is provided with all information necessary to bring itself up to the state of the primary task should there by a failure of the primary task. The inventive concept is based on the notion that if two tasks start out in identical states and are given identical input information, they will perform identically.
In particular, all inputs to a process running on a system according to the invention are provided via messages. Therefore, all messages sent to the primary task must be made available to the secondary or backup task so that upon failure of the primary task the secondary task catches up by recomputing based on the messages. In essence, then, this is accomplished by allowing every backup task to “listen in on” its primary’s message.
United States Patent 4,590,554 Glazer , et al.May 20, 1986
See also: A message system supporting fault tolerance.
and a very similar later patent.