MiFID2 and security – keeping track of the money

1024px-Quentin_Massys_001

A shorter version of this post is on the  FSMLabs web site.  MiFID2 is a new set of regulations for the financial services industry in Europe that includes a much more rigorous approach to timestamps.  Timestamps are in many ways the foundation for data integrity in modern processing systems – which are distributed, high speed, and generally gigantic. But when regulations or business or other constraints require timestamps to really work, the issues of fault tolerance and security come up. It doesn’t matter how precise your time distribution is if a mistake or a hacker can easily turn it off or control it.

TimeKeeper incorporates a defense-in-depth design to protect it from deliberate security attacks and errors due to equipment failure or misconfiguration. This engineering approach was born out of a conviction that precise time synchronization would become a business and regulatory imperative.KeystoneCops

  1. Recent disclosures of still more security problems in the NTPd implementation of NTP show how vulnerable time synchronization can be without proper attention to security. PTPd and related implementations of the PTP standard have similar vulnerabilities.
  2. Security and general failure tolerance should be on the minds of firms that are considering how to comply with the MiFID2 rules because time synchronization provides both a broad attack surface and a single point of failure unless properly implemented.

The first step towards time non-naive time synchronization is a skeptical attitude on the parts of IT managers and developers. Ask the right questions at acquisition and design time to prevent unpleasant surprises later.

One of the most dangerous aspects of the just disclosed NTPd exploit is that NTPd will accept a message from any random source telling it to stop synchronizing with its actual time sources. Remember, NTPd is an implementation of NTP, other implementations may not suffer from the same flaw. That d is easy to overlook, but it’s key. TimeKeeper’s NTP and PTP implementations will, for example, ignore commands that do not come from the associated time source and will apply analytical skepticism to commands that do appear to come from the source. TimeKeeper dismisses many of these types of attacks immediately and will start throwing off alerts to provoke automated and human counter-measures. The strongest protection TimeKeeper offers, however, comes from its multi-source capabilitiesthat allow it to compare multiple time sources in real-time and reject a primary source that has strayed.

Correct time travels a long, complex path from a source such as a GPS receiver or a feed like the one British Telecom is now providing. Among the questions system designers need to ask are the following two.

  1. Is the chain between source and client safeguarded comprehensively and instrumented end-to-end?
  2. Is there a way of cross-checking sources against other sources and rejecting bad sources?

Without positive answers to both of these questions, the time distribution technology is inherently fragile and robust MiFID2 timestamp compliance will be unavailable.

The painting is: “Quentin Massys 001” by Quentin Matsys (1456/1466–1530) – The Yorck Project: 10.000 Meisterwerke der Malerei. DVD-ROM, 2002. ISBN 3936122202. Distributed by DIRECTMEDIA Publishing GmbH.. Licensed under Public Domain via Commons – 

MiFID2 Timestamp regulations

800px-Hans_Holbein_der_Jüngere_-_Der_Kaufmann_Georg_Gisze_-_Google_Art_Project

There are a number of places in the new guidelines that increase the rigor required for timestamping data. One key part covers SI’s (systematic internalizers) who operate kind-of like private exchanges. TimeKeeper’s ability to produce traceable audit and to use multiple sources is designed for precisely this kind of application.

Moreover, the inclusion of the timestamp in the pre-trade information published by the SI is a key information for the client to better analyse ex-post the quality of prices quoted by SIs, and in particular to assess with accuracy the responsiveness of the SI and the validity periods of quotes. Without a timestamp assigned by the SI itself, market participants would need to rely on the information potentially provided by data vendors, the timestamps of which would be less accurate, especially when quotes are published through a website as pointed out by some respondents to the question on access to the quotes of SIs

Image is by Hans Holbein the Younger (1497/

RISC-V and Bloat

Went to UT today to listen to David Patterson speak about the open RISC  instruction set architecture and processors he and his colleagues are developing.  As a software developer, hearing about free hardware makes me giddy with joy, perhaps with a bit too much of a “let’s see how you like it” edge. It’s interesting you can get current generation processors manufactured in small numbers for really modest amounts of money now.  Patterson correctly objects to the kitchen sink nature of Intel and ARM processors now but I suspect he will run into the traditional problem from David Parnas’s software jewels paper.

Annals of unintentional irony

Come back after a trip to see Marc Andreessen’s team of twitter posters complaining wryly about how government is so gosh durn big and citing experts like Milton Friedman.

//platform.twitter.com/widgets.js

That is, Marc Andreesen, the former National Center for Supercomputing Applications programmer who helped write the Mosaic Web Browser for the WWW that CERN scientists developed on top of the Internet technology that DARPA and NSF researchers and bureaucrats created on top of prior government funded technology like integrated circuits, and then built a company that made him rich with the team that worked together at NCSA,  engaging in the typical rich person bitching about the gummint.  Which shows how little reality matters to people.

The Enterprise Profile for PTP and TimeKeeper

One of the most interesting things we saw in the proposed IEEE 1588 enterprise profile was a bold suggestion on fault tolerance that looked familiar. Here’s FSMLabs press release from September 2011

TimeKeeper 5.0 offers the ability to monitor multiple time distribution channels, even those operating on different time distribution standards or of different quality due to distance or network issues. As an example, a TimeKeeper client may monitor two different Precision Time Protocol (PTP) “master clocks” and three different Network Time Protocol (NTP) servers. In addition, if the time quality of TimeKeeper’s primary sources becomes questionable, TimeKeeper can now switch from tracking one time source to another, according to a fail-over list provided at configuration time.

This press release described products that were already in the field in production. I remember that although customers liked this capability, talks at timing conferences often provoked complaints from engineers who insisted that the PTP “Best Master Clock” protocol already solved the problem. Anyways, it was gratifying to see that by February 2015 a similar, scaled down, capability was being proposed for the PTP Enterprise Profile. 

Clocks SHOULD include support for multiple domains.  The purpose is to support multiple simultaneous masters for redundancy. Leaf devices (non-forwarding devices) can use timing information from multiple masters by combining information from multiple instantiations of a PTP stack, each operating in a different domain. Redundant sources of timing can be ensembled, and/or compared to check for faulty master clocks. The use of multiple simultaneous masters will help mitigate faulty masters reporting as healthy, network delay asymmetry, and security problems.  Security problems include man-in-the-middle attacks such as delay attacks, packet interception / manipulation attacks. Assuming the path to each master is different, failures malicious or otherwise would have to happen at more than one path simultaneously. Whenever feasible, the underlying network transport technology SHOULD be configured so that timing messages in different domains traverse different network paths.

Note that there are three things missing from this proposal that were in TimeKeeper 5.0 back in 2011: the ability to use NTP sources as well as PTP, the ability to use multiple PTP sources in the same domain, and working software. Stating “SHOULD” in a standard is a long way from “works in the field” but recognition of the problem is a good step.

 

What does the UNIX file system do?

Unix, Linux, Windows and other operating systems and the world wide web all support file systems with the familiar path file names  like

 "/home/snowden/travel/boardingpass.pdf"
 or "/system/passwords/secret/dontread.txt"

although sometimes with different separator characters between the individual “flat” file names. For example, Windows uses “”. As long as we know how to separate flat file names in the sequence, it doesn’t matter. The flat file names are chained together in a path through the file system that shows “where” a file can be found. URL’s in the world wide web are just path file names with some more information around them.  Constructing the file system involves a clever technique for embedding a tree in a simpler file system where file names are just numbers.

For historical reasons, the base file system uses numbers called “inode numbers” to name files.  Ignoring modifications, this file system looks like a function F:InodeNumbers → FileData. The tree emerges from information stored in some of the files. FileData includes some files that are just data and some files that are maps called “directories” (or “folders”). Directory maps have the form d: SimpleFileNames → InodeNumbers.  If we have a path file name “a/b/c” and a starting inode number i, we can first get d1 = F(i), the contents of file i which should be a directory, then get ia= d1(a) the inode number of the file named a, and then da= F(ia) and  ib= da(b) and db= F(ib) and  ic= db(c)  and then the contents of the file “a/b/c” is F(ic ) – assuming that the path is defined.  More concisely, we can write  ia= F(i)(a) and  ib= F(ia)(b) and so on where functions are resolved left to right: for example, F(i)  is a map which is then applied to a. 

Computing the translation of a path file name to an inode number can be defined recursively in terms of a function usually called namei (for names to inode numbers).  If the path file name is the empty path, then we are already where it leads: namei(i,Empty) = i.  If the path file name is not empty, it has the form a/p  where  is a simple file name (of any length) and  is a path file name with one less simple file name in it than the original path: namei(i,a/p) = namei(F(i)(a),p) It’s possible that namei(i,p) is not defined – for example, F(i) might not even be a directory function or it might be one but d=F(i) might not be defined on the leftmost simple file name in the path. In that case, we have “file not found” or “404” in the case of a URL.

A UNIX type file name has a special inode number for the “root” directory.  For any path  file contents is then U(p)= F(namei(root,p)).  A consistent file system will have at least the following properties.

  1. No orphans. For every  in InodeNumbers,  if F(i)  is defined there must be a path  so that namei(root, p) = i. 
  2. No dangling references. For every so that F(i)  is a directory function and for ever simple file name so that F(i)(a)  is defined, F( (F(i))(a)) must also be defined (that is, if F(i)=d  and d(a)=j  it must be the case that  F(j) is defined.)  

Another useful property limits cycles or loops through the file system and aliases. If U(p)  is a directory, let Children(p) = {a: U(p)(a) is defined} where  is a variable over flat file names.  Then define find(p) = {p} if is not a directory or Children(p) = emptyset and define find(p) = union{find(pa): a in Children(p)} . If there are no loops, this is a well defined function that terminates with the set of leaf nodes reachable from p. For example if one were in an organization concerned about security, there might be regular monitoring of find(/home/snowden) to see if any unauthorized data had been collected.

The most stringent non-alias requirement would be that if namei(root,p) = namei(root,q) then p=q. There can be no loops if there are no aliases. This requirement is usually relaxed to accommodate the “parent” and “self” pseudo file names, and hard and soft links. The simple file name “.” is usually reserved to mean “self” so that if F(i)  is a directory F(i)(“.”) = i. The pseudo-file-name “..” is used for “parent” so that if F(i)(a)=j  and F(j) is also a directory, then F(j)(“..”) = i. These pseudo-file names introduce both loops and aliases so we could just limit the requirement for no aliases to the cases where and  don’t contain any pseudo-file-names. Note that the definition of the parent pseudo-file-name limits many kinds of loops because it cannot be that a directory points back at two different parents.

Soft links, a later addition to UNIX files, are a more complex problem. For soft links we add file contents that are path file names and modify namei  so that if j=F(i)(a)  is a soft link with F(j)=q, then namei(i,a/p) = namei(root,concat(q,p)). The original definition of namei has a nice property that the path shrinks by one flat name at every step and this change loses that property and makes it easy to create loops that never finish. The solution to that is to count soft links and just give up if a path takes us to more than some set limit number of soft links.