I have a blog post up at fsmlabs.com about our TimeKeeper software for time synchronization. TimeKeeper is currently aimed at financial trading markets, but we also hope to market it to electric powerÂ distribution and transmission engineers who have a similar need for precise time synchronization within substations and for instrumentation. There are also applications in data bases for someone with a little interest in innovation.
TimeKeeper really builds on what our experience with RTLinux taught us about barriers to use. TimeKeeper installs simply – no developer needed, it’s just an app; it requires nearly no configuration; and it is invisible to application code except that it makes sure they get accurate time when they ask for the time.
Real-time operating systems are either a solved problem or a backwater of engineering design. Threads, semaphores, mutexes, some basic I/O, priority scheduling all of this has been more or less standardized in the POSIX 1003.13 smaller profiles (51,52) for many years. The basic programming model has not changed in years. Even FSM’s original RTOS and QNX, the two most unusual RTOS’s, are pretty similar from a programming point of view except for the split between real-time and non-real-time in our old product. My suspicion is that the programming model provided by these RTOS designs can be replaced by something better mostly because I think synchronization is a painful and error prone exercise that only gets worse as systems become more complicated. On the other hand, although many of our customers really loved it, and I think it was a huge advantage, the split mode approach in the old RTLinux* was a difficult learning experience for a lot of people.
Consider a simple design:
- Task A: collect data from an Analog/Digital converter, sampling at rate 1/t seconds and filling a small buffer (1/t)*n seconds.
- Task B: Aggregate A data and produce 3 streams of processed data that are functions of the input data and some settings.
- Task C: produce control data from the processed data at an output rate of 1/t2 seconds
- Task D: provide a secure internet port which will process requests to interrogate data, to setup data push at some rate maximum 1/t3 pushes per second and update the settings with worst case delay from arrival of input packet to reset of 1/t4 seconds.
- Task E: operate a touch screen display that has some of the same functionality as task D
That seems like it should be straightforward, but it’s not: it’s a one year project that has a 70% chance of failing. Making sure shared data is efficiently and correctly shared, validating that the scheduling and timing is ok, and optimizing for hardware limits and power are all hard. And that seems wrong because there is nothing in such a project that has not been done thousands of times before. As the electric power industry stumbles into modern software based control, the slow development times and poor reliability of products developed under this model is a serious problem.
* NOTE: RTLinux is a trademark of WindRiver Systems.
[note: edited to remove garbage characters]
Updated rough draft available with thrilling descriptions of atomic compare and swap and some comments on “formal methods”. Bonus photo
Synchronization is hard in real-time applications, but not as hard as people imagine. If you follow a few simple rules you can make it manageable.
- Never force priority and mutual exclusion to fight each other. You can’t mean “Task A is more important than TaskB” and “TaskB should be able to lock TaskA out of some data structure as long as it want” at the same time.
- Long critical sections are sure signals of bad design. Use a simpler data structure or a client/server architecture or something.
- Stick to two or three mechanisms. If semaphores and RT-Fifos don’t do the trick, then maybe you should simplify your design.
See my paper for more details.