## Process algebra versus state machines part 1

Robin Milner’s influential book Communication and Concurrency involves a take on state machines that has always puzzled me. “Now in standard automata theory, an automaton is interpreted as a language,” Milner writes “i.e. as a set of strings over the alphabet.” That’s not at all correct, but let’s accept the claim for now and follow the argument. Consider two state machines A and B with an alphabet of events  {a,b,c,d}  and A has states {A,A1,A2, A3} and B has states {B,B’,B1,B2,B3}. The state machine transitions can be given by ordered triplets (state1,input, state2) that show the input label  on a directed edge between state1 and state2.  For Milner’s example:

state machine A has transitions { (A,a,A1), (A1,b,A2), (A1,c,A3), (A3,d,A) },

state machine B has transitions: { (B,a,B1) (B,a,B’), (B1,b, B2), (B’,c,B3), (B3,d,B)}.

B is non-deterministic because there are 2 “a” transitions from state B. Milner points out that if we consider A2 and B2 to be accept states, both machines accept the same language (acd)*ab. So far so good. At this point Milner asks us to think of {a,b,c,d} as “ports” or maybe buttons that can be pushed. The button “is unlocked if the agent can perform the associated action and can be depressed to make it do the action, otherwise the button is locked and cannot be depressed.”  Then: “after the a-button is depressed a difference emerges between A and B. For A – which is deterministic – b and c will be unlocked, while for B – which is non-deterministic – sometimes only b will be unlocked and sometimes only c will be unlocked.” If you don’t look carefully, you’ll miss a subtle change of conditions that has significant ramifications.

An experimenter or external agent trying to push these buttons will discover a difference between the two machines eventually because some times after an initial “a” input on the second state machine a “b” is possible and sometimes not, although on the first state machine after an “a” the “b” is always possible.  But how does the external agent determine that B will not perform a “b” action sometimes? The external agent “attempt[s] to depress b” and fails – the locked/unlocked state of each button is visible to the external agent. So Milner has changed definitions in the middle of the argument.  At the start, finite state machines were language recognizers with, as the classical text on automata theory explains: “output limited to a binary signal: ‘accept/don’t accept’ ” [Hopcroft and Ullman].  Those automata will not tell us  anything else about a word other than that binary condition – is it in the language or not.  But Milner’s button state machines tell us also what buttons are locked and what are unlocked in the terminal state reached by the word.  So Milner’s state machines distinguish words that a recognizer state machines does not. In fact, these Milner state machines have 5 binary outputs in each state – indicating the locked/unlocked status of each button plus accept/don’t accept. State machines with more than a binary output alphabet are called Moore or Mealy machines in poor old standard automata theory.

Standard automata theory does not “interpret” state machines  “as a language” but there is a theorem that the class of languages recognized by those finite state binary output deterministic state machines is the same as the class of languages recognized by finite state non-deterministic state machines. Two  machines that recognize the same language may be distinct in many other ways.   And state machines that have additional outputs (sometimes called “transducers”) are essentially descriptions of maps from input strings to output strings or from input strings to output value in the terminal state. Standard automata theory would say Milner’s two machines accept the same language of strings, but produce different languages of strings.

Standard automata theory, as far as I know, has never really considered the case of non-deterministic Moore machines but the extension is trivial. Milner’s transition systems are just labeled directed graphs with a root vertex. Consider a labeled directed graph G with labels A, a distinguished root vertex (start state)  s0, the set of triples R= { (s1,a,s2) if there is an edge labeled a from s1 to s2}. The set of vertices V is the set of states. We can define a relation R* subset A* x V so that R* is the largest set containing only (null,s0) and (wa,s’) whenever (w,s) is in R* and (s,a,s’) is in R – where wa is the string obtained by appending “a” to “w” on the right.  For any vertex “s” define Q(s) to be the subset of A containing every “a” so that (s,a,s’) is in R for some s’ in V. Then let G* be the set of pairs (w,Q(s)) for every (w,s) in R*.  As far as I can tell on a quick look, Milner’s bisimulation between G1 and G2 is simply equality of G1* and G2*.

## Theories of abstract automata – Arbib

We have said that automata theory deals with the realization of partial functions F: X* —» Y* by some finitely specifiable substrate. Before we specify in more detail the forms (of which the Turing machine is one) of substrate which have figured most prominently in automata theory, it is useful to distinguish on-line machines from off-line machines. An on-line machine is one that may be thought of as processing data in an interactive situation—in processing a string it must yield a continual flow of outputs, processing each symbol completely (albeit in a way dependent on prior inputs) before it reads in the next symbol. This means that the corresponding function F: X* —> F* must have the following special property:

1 For each nonempty string u of X* there exists a function Fu: X* —» Y* such that for every nonempty v in X*
F(uv) = F(u) • Fu(v) that is, the input string u causes the machine to put out the string F(u) and to “change state” in such a way that it henceforth processes inputs according to a function Fu determined solely by F and u. We call a function sequential if it satisfies property 1.

## the terrible effect of Krohn-Rhodes

One of the most interesting theorems in computer science is the Krohn-Rhodes theorem that shows a strong link between basic computer science and group theory. Crudely, the KR theorem extends Jordan-Holder $(H_1 triangleleft H_2 ... triangleleft H_n=G)$  to state machines, but it does it in a way that doesn’t get us very far in understanding the structure of state machines. The problem is that the “division” is too restrictive. In terms of state machines, KR requires us to turn $M$ into $M_1 rightarrow M_2$ where the input of $M_2$ depends only on the system input and the output of $M_1$.  This is called a “loop free” decomposition and Hartmanis-Stearns worked on it before KR teased out the underlying math. You can build a ripple carry adder that way and it makes sense that the simple devices that concerned early CS researchers seemed amenable to such a decomposition, but a few steps down the path and we come to systems that do not want to be decomposed into loop-free series at all. Consider a queue – when a new element is inserted each element needs the output of its neighbor to the back to get its new value – very nicely loop free. But consider a stack. A push makes each element take an input from its neighbor to the right and a pop makes each element take an input from its neighbor to the left.

$langle x_1, x_2 ... x_n rangle mapsto[push y] langle y, x_1 dots x_{n-1}rangle$

$langle x_1, x_2 ... x_n rangle mapsto[pop] langle x_2 dots x_{n},nullrangle$

In fact the stack is serially loop free – it has a nice loop free ordering per operation – but Krohn-Rhodes doesn’t go there (you could get to loop free by adding a counter to each element).  Anyways, the result is a theory that bounces off the observed or designed structure of computational objects and, at least to me, that designed structure is the interesting structure.

## Process algebra reconsidered

Paper is here.
The following incorrect claim is not unusual in the process algebra literature.

Basically, what is missing [in classical automata theory] is the notion of interaction: during the execution from initial state to fi nal state, a system may interact with another system. This is needed in order to describe parallel or distributed systems, or so-called reactive systems. When dealing with interacting systems, we say we are doing concurrency theory, so concurrency theory is the theory of interacting, parallel and/or distributed systems.[Bae05]

Actually a sophisticated notion of state machine product was developed for representing composition of “interacting” state machines starting in the 1950s[HS66]. A general survey can be found in a monograph by Gecseg[Gec86], Domosi provides a more modern, more algebraic treatment [DN04] and [Yod09] provides practical techniques for construction of complex products.
[…]
The reader familiar with the process algebra literature will note that such an e ffort must find a way to model the non-determinism that is so fundamental to the process algebra world-view. Since deterministic programs are used to produce pseudo-random number sequences and to model Brownian motion and even the stock market (perhaps not the best example at this date) the problem is easier to solve than it appears at fi rst. Section 4 will include a short discussion on the diff erence between “real” non-determinism and simulated non-determinism, but certainly there is nothing in the basic axiom set of Milner’s original process algebra that notices this distinction as far as I can see.

To read more see the paper.

## Queues and algebra

Suppose we have a state machine Q, that implements a common first in first out queue. The input alphabet of Q consists of “Deq” and “Enq x” where “x” ranges over a set of values, say, V. Let’s fix the max length of Q at some constant k. As usual, given a sequence of events “w”, write Q(w) for the output of Q in the state reached by following “w” from the initial state. For the empty string NULL, define Q(NULL)=(). If a=Deq then define Q(wa) depending on Q(w) – if Q(w)=() or Q(w)=(x) then Q(wa)=(). If Q(w)=(x1 .. xn) then define Q(wa)=( x1 …). If a=Enq x then if Q(w)=() define Q(wa)=(x), if Q(w)=(x1 .. xn) where n<k define Q(wa)=(x,x1 … xn) and otherwise leave Q(wa)=Q(w). The number of states of Q is a function of the number of elements of V and the constant k and grows rapidly as they grow – sum[i=1, k](|V|^i  ).

If you visualize the graph of Q, you can see it contains non-trivial loops – which means that the monoid associated with Q contains groups. The advantage of decomposing Q into a product of smaller machines is that this might guide a smarter implementation. Well, I’m kind of cheating here, because programmers have found a variety of smart implementations that do strip out the group structure.

The easy decomposition produces two counters, and a collection of k storage elements that are loop free state machines: an input “store x” moves the machine to a “x” state. They do contain loops but only trivial ones. The counters are group machines. We number the storage cells 0 to k-1.  We can have one mod k counter that “points” to the tail of the queue and one that counts the size of the queue. On an enq operation the enqueued value is stored in the storage element identified by subtracting the size from the tail mod k,   and the size is increment mod k+1. An enq operation when the size=k is ignored and has no effect. On a deq, the size is decremented and the tail is incremented by one mod k+1. What we’ve done is construct a system in which the state machine monoids are two cyclic groups and k-aperiodic monoids. If we then switch to more complicated data structures, like ordered lists, it’s not that hard to see how a permutation group might be used to manipulate pointers to a pile of storage locations. For example (n1,n2,n3,n4,blank, n5,n6) might be an ordered list with n5 and n6 being free. An enqueue would generate some shuffle to say, (n1,n2,n5,n3,n4,blank,n6) and then a dequeue would produce (n2,n5,n3,n4,blank, n6,n1).

A nice property of groups is that “undo” always works. So if the data in the storage elements can be marked in some way, a deq or enq can be undone by subtracting mod k or permuting backwards.

## simple lemma about pipelines

The connection between group structure and pipeline design seems like it merits a lot more attention than it gets. It’s not too hard to show that in a pipeline like the one to the right, the induced monoid of M1 must be a homomorphic image of the composite state machine. This immediately brings in the limiting factors of simple groups.

KR theory is too vague though. For example, I’d like to know something about the relationship of M and M’ in the case that the induced monoids are isomorphic. They definitely do not have to be the same, but seems like they should be similar enough so that changing the output map would equalize.

## Process algebra (updated)

The first part of a critique of process algebra is below.Â  This relates to the Recursion and State paper and explicatory blog entry where I show how to compose classical automata and define them “abstractly” and to a complaint about the weak critique of automata theory in standard process algebra literature and also to a remark about Dijkstra’s error. Â  There are other parts of it scattered around in some recent posts and some other issues that need to be raised, but the basic argument is in place.

## Recursion and state

Despite some deep results, algebraic automata theory has fallen out of favor in theoretical computer science. Reasons include the disciplines failings such as a love of over-generality, weak mathematical background of people working on “formal methods”, and gap between theoreticians and engineers. But perhaps the key reason is that traditional state machine presentations in terms of state sets and transition maps are awkward and carry too much irrelevant information.

The standard presentation of a Moore machine isÂ  M=(A,X,S,st,Î´,Î³)Â  whereÂ  “A” is the alphabet of events, “X” is the set of outputs, “st” is the start state,Â  Î´:S x Aâ†’ S is the transition function and Î³:S â†’ X is the output function.Â  ButÂ  we really don’t care about the state set in all but the simplest state machines. Interesting systems have ridiculously large numbers of states.Â  Names of states don’t matter.Â  Extra states don’t matter. Suppose “s” is an element of “S” that is unreachable – no sequence of inputs will drive M to “s” from “st”. In that case what difference does removing “s” from “S” make? Or suppose s and s’ are both reachable but not only is Î³(s)=Î³(s’) but for any sequence of inputs that can drive the machine from s to a new state, the output of that new state is identical to the output of the state reached by following the same sequence of inputs from s’.Â  In that case, we could simplify S and Î´ by removing one of the duplicative states and merging paths.

What’s more interesting is to consider what output a Moore machine will produce from a sequence of inputs: thinking of the Moore machine as a map from input strings to outputs. This isÂ  a function that can be constructed by primitive recursion on strings.Â  First, extend Î´ to strings to define a new function Î”. TheÂ Â  empty string that has zero length doesn’t cause any state change Î”(s,empty)=sÂ Â  and if “w” is a string and “wa” is the string obtained by appending “a” to “w”, then Î”(s,wa)= Î´(Î”(s,w),a).Â  DefineÂ  M*(w) = Î³(Î”(st,w)). The idea is that M*(w) is the output produced by M in the state reached by following “w” from the initial state.Â  If you look at M* you see it contains all the interesting information in M -Â  because the names of states and unreachable states and duplicate states are not interesting. In fact in the 1950s Myhill and Nerode described how to generate a state machine (not necessarily a finite state one) from any function on the set of strings over an alphabet.

These functions are much more useful in describing complex state systems than are Moore machines – not least because they can be partially specified when we don’t know or care about all the possible behaviors of the system and because they can be composed in a way to represent any interconnection.

In the next post, I’m going to discuss composition of message passing processes and compare to the “formal methods” approach which is based on the incorrect assumption that automata need to be “enhanced” in some way and that replaces automata with things that have very little mathematical structure at all.