The source of error (updated)

Here’s Edsger Dijkstra discussing the birth of the use of axiomatics in computer science – the start of “formal methods” research.  What’s striking is the assumed choice between “axiomatic” and “mechanistic” as if there was no other way. In a later note he writes:

And now we are back at our old dilemma. Either we take by definition all properties of the model as relevant, or we specify in one way or another which of its properties are the relevant ones. In the first case we have failed to introduce in the name of “divide et impera” an interface that would allow us to divide and rule and the study of how we could build upon the (only implicitly defined) interface seems bound to deteriorate into a study of the model itself; in the second case we are again close to the axiomatic method….

[…]

Or, to put it in another way: if the traditional automata theory tends to make us insensitive to the role interfaces could and should play in coping with complex designs, should it then (continue to) occupy a central position in computing science curricula?

And I’m struck by the idea  that seems utterly wrong to me, that one either uses the methods of formal logic OR one is stuck without any ability to abstract or underspecify

The reason mathematics has advanced so much was not because of the Euclidean axioms-lemma-theorem straitjacket, but in spite of it. Luckily, when we actually discover mathematics, we do it the Babylonian way, empirically and algorithmically. It is only when it is time to present it, that we put on the stifling Greek formal attire.

so says Doron Zeilberger

UPDATE: I have a draft of the “process algebras considered harmful” note up.

Recursion and state

automata 071 Despite some deep results, algebraic automata theory has fallen out of favor in theoretical computer science. Reasons include the disciplines failings such as a love of over-generality, weak mathematical background of people working on “formal methods”, and gap between theoreticians and engineers. But perhaps the key reason is that traditional state machine presentations in terms of state sets and transition maps are awkward and carry too much irrelevant information.

The standard presentation of a Moore machine is  M=(A,X,S,st,δ,γ)  where  “A” is the alphabet of events, “X” is the set of outputs, “st” is the start state,  δ:S x A→ S is the transition function and γ:S → X is the output function.  But  we really don’t care about the state set in all but the simplest state machines. Interesting systems have ridiculously large numbers of states.  Names of states don’t matter.  Extra states don’t matter. Suppose “s” is an element of “S” that is unreachable – no sequence of inputs will drive M to “s” from “st”. In that case what difference does removing “s” from “S” make? Or suppose s and s’ are both reachable but not only is γ(s)=γ(s’) but for any sequence of inputs that can drive the machine from s to a new state, the output of that new state is identical to the output of the state reached by following the same sequence of inputs from s’.  In that case, we could simplify S and δ by removing one of the duplicative states and merging paths.

What’s more interesting is to consider what output a Moore machine will produce from a sequence of inputs: thinking of the Moore machine as a map from input strings to outputs. This is  a function that can be constructed by primitive recursion on strings.  First, extend δ to strings to define a new function Δ. The   empty string that has zero length doesn’t cause any state change Δ(s,empty)=s   and if “w” is a string and “wa” is the string obtained by appending “a” to “w”, then Δ(s,wa)= δ(Δ(s,w),a).  Define  M*(w) = γ(Δ(st,w)). The idea is that M*(w) is the output produced by M in the state reached by following “w” from the initial state.  If you look at M* you see it contains all the interesting information in M -  because the names of states and unreachable states and duplicate states are not interesting. In fact in the 1950s Myhill and Nerode described how to generate a state machine (not necessarily a finite state one) from any function on the set of strings over an alphabet.milenr 001

These functions are much more useful in describing complex state systems than are Moore machines – not least because they can be partially specified when we don’t know or care about all the possible behaviors of the system and because they can be composed in a way to represent any interconnection.

In the next post, I’m going to discuss composition of message passing processes and compare to the “formal methods” approach which is based on the incorrect assumption that automata need to be “enhanced” in some way and that replaces automata with things that have very little mathematical structure at all.

Are threads evil? (updated)

This paper by Prof. Edward Lee explains something of why “threads” are such a painful abstraction.  As Prof. Lee notes, threads intrinsically create non-determinism and resource conflicts which we then attempt to “prune” via synchronization and complex tools. In an earlier note, I argued that we should design real-time multi-threaded applications to minimize the need for synchronization, but Prof. Lee goes further to point out that the thread model itself encourages a disorganized program structure.  Along those lines, one of the basic difficulties in real-time application design is non-deterministic resource allocation. How can we ever be sure that, for example, a multi-threaded app where threads can allocate memory has sufficient memory for critical threads to proceed?

I’m not a fan of the “algebraic” tagged model that Lee suggests as an alternative – too much of the flavor of “formal methods” via the denotational semantics base.  In fact, Liu’s thesis, referenced here, struggles mightly with the unsatisfactory nature of the mathematical framework to get somewhere. Do we really have to create lemmas about posets to describe 2 simple processes? It seems to me that the confusion of the underlying mathematical basis has to be resolved before we can figure this out. Or maybe not.

Consider a really dumb system that keeps a “gain” variable, outputs a single analog voltage, and inputs both an input signal and “messages” that tell it what to do. The input events can be thought of as samples of the analog signals on the “pins”.

Inputs are maps: Pins -> Signals  where each input represents one unit of time and the set Pins contains “Vin” and some unknown number of pins (which may not be actual physical pins) comprising the message port.

Given a sequence of inputs, w, we suppose we have a function LastGain(w) that extracts the value of gain we were told to set in the most recent message and SinceLastCommand(w) that counts samples (time units) since the last command message.  Let’s be more specific on the “Vin”

LastVin(wa) = a(Vin).

StableVin(wa) = (1+ StableVin(w))*[a(Vin) ==  LastVin(w)]  where [exp] is 1 if exp is true and 0 otherwise

StableVin(emptysequence)=0

Then we can require that Dev(w)= g*v if

StableVin(w)>= t1 and v=LastVin(w)

and g=LastCommand(w) and SinceLastCommand(w) >= t2

Dijkstra versus Perlis (updated)

Here’s Dijkstra

He [Perlis] published a very obnoxious paper arguing against a mathematical approach to programming cite

Here’s the paper by De Millo, Lipton and Perlis. It starts as follows:

Many people have argued that computer programming should strive to become more like mathematics. Maybe so, but not in the way they seem to think. The aim of program verification, an attempt to make programming more mathematics-like, is to increase dramatically one’s confidence in the correct functioning of a piece of software, and the device that verifiers use to achieve this goal is a long chain of formal, deductive logic. In mathematics, the aim is to increase one’s confidence in the correctness of a theorem, and it’s true that one of the devices mathematicians could in theory use to achieve this goal is a long chain of formal logic. But in fact they don’t. What they use is a proof, a very different animal. Nor does the proof settle the matter; contrary to what its name suggests, a proof is only one step in the direction of confidence. We believe that, in the end, it is a social process that determines whether mathematicians feel confident about a theorem–and we believe that, because no comparable social process can take place among program verifiers, program verification is bound to fail.

To me, the problem with Dijkstra is that he was so sharp and such a good writer that he was able to make persuasive cases out of wrong ideas. Dijkstra wanted to be a scientist in the model of theoretical physics, not an engineer. I’m pretty confident that Dijkstra was wrong: programming is engineering – in fact, physics is not as far from engineering as some people would like to believe. I’m not a huge fan of the engineering discipline as it exists in the USA. It has all sorts of negative aspects – including those Dijkstra railed against. But the vision of a programmer as, not a mathematician, but a formal logician flying far above the grubby compromises and trade-offs of mere engineering in a platonic bubble of pure reasoning is wrongheaded.

Dijkstra published some criticism of the Demillo paper at the time and in their response the authors stated something that, as far as I know, remains true

We must begin by refusing to concede that our confidence in a piece of
real software has ever been increased by a proof of its correctness

When I was in graduate school, a famous formal methods scholar came for a talk and explained to us that formal methods were needed if we were ever going to develop fault tolerant software. I pointed out that, for example, the Tandem Software worked pretty well in practice. “It cannot”, retorted the famous scholar.

So there.

Meaning of concurrent programs and IP

Most of the new draft of the Concurrent Programs paper has to do with trying to specify problems and solutions in synchronization via an atomic  “compare and swap” operation. Even these operations are surprisingly complicated once put under the microscope – or not so surprisingly complicated if you think about the details of using or implementing them. But at the end of the paper, I start to describe the fundamental difference in approach between this work and “formal methods”  in terms of how we can view a program. Dijkstra seems to me to have made an error by insisting that we should think of a program as a mathematical object and a programmer as a mathematician of sorts. A program is more of a manufactured object.

Even though it has no weight and is invisible, a program is device of sorts.  If we think of a program as a mathematical object, the methods of formal logic – of meta-mathematics – are the natural methods to use. If we think of a program as a device, like a piano or a bathtub or an automobile, then what mathematicians call “foundational questions” are far away. People are very resistant to the idea of a program as a “thing” for the obvious reasons, but mathematical objects don’t have the weird properties and defects of manufactured objects (or physical objects for that matter). A ball bearing is not a sphere and an implementation of a stack is not an ordered sequence. Recursion in programs is different from recursion in mathematical functions: f(0):=1, f(x+1):= (x+1)*f(x) is  a complete definition of a mathematical object but only an approximation of some of the properties of the program written the same way. Any programmer will know that f(10000) almost certainly fails – and that’s an important part of the specification of the program.

The confusion between programs and mathematical objects is a pervasive obstacle in fields as apparently unrelated as program verification, software pricing, and patent law. In program verification we bog down in foundational methods of formal logic because that is the obvious tool to study mathematical objects, but it’s certainly not the obvious tool for doing mathematics or describing manufactured objects.   In software pricing, the resistance of manufacturers to let mere software figure in to costs in terms of how complex it is and how much it adds to value rather than by weight repeatedly leads to product development schedules that invest too little time and/or money in the most important components. In patent law, confusion between unpatentable mathematical methods and technology for programs is used to deny software innovators the same rights that are uncontroversially granted to innovators who design cardboard boxes