Queues and algebra

Suppose we have a state machine Q, that implements a common first in first out queue. The input alphabet of Q consists of “Deq” and “Enq x” where “x” ranges over a set of values, say, V. Let’s fix the max length of Q at some constant k. As usual, given a sequence of events “w”, write Q(w) for the output of Q in the state reached by following “w” from the initial state. For the empty string NULL, define Q(NULL)=(). If a=Deq then define Q(wa) depending on Q(w) – if Q(w)=() or Q(w)=(x) then Q(wa)=(). If Q(w)=(x1 .. xn) then define Q(wa)=( x1 …). If a=Enq x then if Q(w)=() define Q(wa)=(x), if Q(w)=(x1 .. xn) where n<k define Q(wa)=(x,x1 … xn) and otherwise leave Q(wa)=Q(w). The number of states of Q is a function of the number of elements of V and the constant k and grows rapidly as they grow – sum[i=1, k](|V|^i  ).

If you visualize the graph of Q, you can see it contains non-trivial loops – which means that the monoid associated with Q contains groups. The advantage of decomposing Q into a product of smaller machines is that this might guide a smarter implementation. Well, I’m kind of cheating here, because programmers have found a variety of smart implementations that do strip out the group structure.

The easy decomposition produces two counters, and a collection of k storage elements that are loop free state machines: an input “store x” moves the machine to a “x” state. They do contain loops but only trivial ones. The counters are group machines. We number the storage cells 0 to k-1.  We can have one mod k counter that “points” to the tail of the queue and one that counts the size of the queue. On an enq operation the enqueued value is stored in the storage element identified by subtracting the size from the tail mod k,   and the size is increment mod k+1. An enq operation when the size=k is ignored and has no effect. On a deq, the size is decremented and the tail is incremented by one mod k+1. What we’ve done is construct a system in which the state machine monoids are two cyclic groups and k-aperiodic monoids. If we then switch to more complicated data structures, like ordered lists, it’s not that hard to see how a permutation group might be used to manipulate pointers to a pile of storage locations. For example (n1,n2,n3,n4,blank, n5,n6) might be an ordered list with n5 and n6 being free. An enqueue would generate some shuffle to say, (n1,n2,n5,n3,n4,blank,n6) and then a dequeue would produce (n2,n5,n3,n4,blank, n6,n1).

A nice property of groups is that “undo” always works. So if the data in the storage elements can be marked in some way, a deq or enq can be undone by subtracting mod k or permuting backwards.

simple lemma about pipelines

The connection between group structure and pipeline design seems like it merits a lot more attention than it gets. It’s not too hard to show that in a pipeline like the one to the right, the induced monoid of M1 must be a homomorphic image of the composite state machine. This immediately brings in the limiting factors of simple groups.

KR theory is too vague though. For example, I’d like to know something about the relationship of M and M’ in the case that the induced monoids are isomorphic. They definitely do not have to be the same, but seems like they should be similar enough so that changing the output map would equalize.

The Amory Lovins bottleneck

Lovins observes that power inputs in many industrial processes go into a bottleneck that makes power conservation hard if you start at the wrong end.  The power goes into a long pipeline of process that emerges on the other end with some useful (in theory) work. If you start on the power input end, then reducing power x% requires percolating incremental improvements down the chain of linked machinery with each step reducing work at the step further down the pipeline. But if you start on the other end, changes automatically flow upward. The same, obviously, holds true for data centers. If you start by improving power efficiency of air-conditioning – a good thing in itself – you cannot obtain the scale improvements that can be gained on the other end of the pipeline by reducing the activities that use power and generate heat. That is, if you can increase work-done/computational-steps you drive savings up the pipeline. And the kind of large scale savings Lovins achieves in other industrial processes seem plausible: if you reduce power demand at the work end enough to reduce the inputs of cooling needed so that a smaller air conditioning unit can be used, you have a potentially greater savings than by improving the efficiency of the air conditioning unit.

An interesting article in ACM communications: is the world ending?

Coverity has a program that reads other programs looking for errors. The company started as a research project from Stanford (how unusual!)  and the  Communications article is really about what they found in commercial world. One thing they found was a lot of crappy programmers.

Upon seeing an error report saying the following loop body was dead code


“No, that’s a false positive; a loop executes at least once.”

Another thing they found is that “it’s more of a guideline than a rule” is a common theory among compiler writers – something that is an interesting problem when your program has to parse code to understand it. Microsoft’s “precompiled” header files feature is a great example of a non-value add. Here’s an amazing observation:

On the other hand, C# and Java have been easier, since we analyze the bytecode they compile to rather than their source.

What that says about the state of the art in abstract specification of programming languages is important. Perhaps byte code is a better target for all sorts of other program transformations including efforts to extract parallelism.

The article also gives a flavor of what its like to fall out of the university programming environment and wander across the strange lands of commercial practice where fossils still walk, unsolvable problems are routinely solved, and solved problems are impossible.

I liked the advice to make presentations to as large a group as possible in the interests of attracting at least one smart person who will get it. There’s nothing more  despiriting than a presentation to a technical group that is in that sort of sullen defensive mood in which lousy practice and ignorance is a fortress of job protection.

Toyota's problem: hardware weenies and poor accounting practices [updated]

Jamie Kitman’s look at the twisted path Toyota followed to it’s current difficulties inspired me to think about software and money – two topics I spend way too much time thinking about. As a purely disinterested observer (ahem) it has come to my attention, repeatedly, that manufacturing companies undervalue, underinvest in, and undertest software.  On a sales visit to a manufacturer of giant size industrial air-conditioners, I was told that some software improvements had reduced operating cost 25% but that purchasing software tools or hiring advanced consulting or top line engineers internally was out of the question. The software team was a capable enough group, but it did not contain anyone who could provide architectural perspective, help with design issues, understand tradeoffs involving OS/processor/board and so on so each project was a one-off desperate attempt to patch the existing undesign. The reason for this absence was simple: top level software engineers could not be hired for $40K/year.  The team manager sadly told me that accounting was not willing to place high value on weightless, invisible code and so the usual rules for R&D for inputs to the manufacturing process gave their team nearly nothing.

As a visitor to Japanese auto companies 10 years ago, I was blown away by the brilliance of the industrial design and the enormous investment in that, and shocked at the software development effort – one in which the software team was clearly considered to be low status. Not that my, more limited, experience with US companies was much more impressive. In fact, it was interesting that as Japanese auto companies were reaching out to at least look at advanced control software, US companies relegated that work to the outer periphery. Of course, this was some time ago, but obviously a $100M investment in top level control software development and test at Toyota a decade ago would have been a good investment – even if they had not given any of it to me.

Once I was visiting an office products manufacturer and discussing their new project. The hardware team had chosen to use a processor for which there was no solid compiler suite, no existing strong web or network or other components etc. because it was cheap. I explained that the choice of processor was fine with us, but that our prices for providing basic RTOS support would be high due to the necessity of compensating for this environmental limitation. The team broke into a raucous argument between the software developers and the hardware engineers – with the software people basically yelling “we told you idiots!”.

One of the things that struck me at a Japanese auto company was how the manufacturing discipline that allowed them to make such high quality hardware was not even attempted with software. Software is still not taken seriously as a manufacturing area – and the consequences are not happy ones. See the article by founders of Coverity for a great tour of the strange views tolerated within software development

Do bugs matter? Companies buy bug-finding tools because they see bugs as bad. However, not everyone agrees that bugs matter. The following event has occurred during numerous trials. The tool finds a clear, ugly error (memory corruption or use-after-free) in important code, and the interaction with the customer goes like thus:


“Isn’t that bad? What happens if you hit it?”

“Oh, it’ll crash. We’ll get a call.” [Shrug.]

Recursion and state update and Hungarian Mathematicians

Please see this ever modifying page which seeks to demonstrate in a poorly organized, yet humorous manner, what the hell I’m trying to accomplish with this primitive recursive state machine stuff.

And as an added bonus – the great “Recursive Functions” book by Rozsa Peters contains the following remarkable note that has nothing to do with the mathematics, but a lot to do with how mathematicians think.

Ok, “the present conditions” were kind of dramatic.

At least 1,000 Soviet tanks are reported to have entered Budapest and troops deployed throughout the country are battling with Hungarian forces for strategic positions.
The Soviet air force has bombed part of the Hungarian capital, Budapest, and Russian troops have poured into the city in a massive dawn offensive.

Instructions per joule – a good start

… multi-GHz superscalar quad-core processors can execute approximately 100 million instructions per Joule, assuming all cores are active and avoid stalls or mispredictions. Lower-frequency in-order CPUs, in contrast, can provide over 1 billion instructions per Joule—an order of magnitude more efficient while still running at 1/3rd the frequency. Worse yet, running fast processors below their full capacity draws a disproportionate amount of power:

Dynamic power scaling on traditional systems is surprisingly inefficient. A primary energy-saving benefit of dynamic voltage and frequency scaling (DVFS) was its ability to reduce voltage as it reduced frequency [56], but modern CPUs already operate near minimum voltage at the highest frequencies. Even if processor energy was completely proportional to load, non-CPU components such as memory, motherboards, and power supplies have begun to dominate energy consumption [3], requiring that all components be scaled back with demand. As a result, running a modern, DVFS-enabled system at 20% of its capacity may still consume over 50% of its peak power [52]. Despite improved power scaling technology, systems remain most energy-efficient when operating at peak utilization

Andersen, D. G., Franklin, J., Kaminsky, M., Phanishayee, A., Tan, L., and Vasudevan, V. 2009. FAWN: a fast array of wimpy nodes. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (Big Sky, Montana, USA, October 11 – 14, 2009). SOSP ’09. ACM, New York, NY, 1-14. DOI= http://doi.acm.org/10.1145/1629575.1629577

Once you start thinking about instructions/joule, you need to also think about work/instruction. This number is going to make current system look a lot worse than even the dismal summary above because the immense amount of overhead is so overwhelming. It’s not even just OS and Virtual machine overhead, as horrible as that it. Consider a Java program. Anyone who has ever looked at production Java code can testify that the number of layers of “frameworks” and “libraries” and so on between a program and actual computation is not small. So consider a program that reads data from the network, processes, stores in a DB, and sends responses. This is not an unusual program model. Ok: we begin with an interrupt that must navigate the virtual machine layer to some OS that most likely contains code that replicates most of the functionality of the VM layer, then  layers of protocol fiddling, the sad fumbling over locks in the a “fine grained locking” OS, the clumsy process model and atrocious scheduling left over from the days in which CPUs were scarce resources, the IO layer (DEAR GOD!!), the pointless copying of every bit of data over and over again, and then the agonizingly bumbling progress of the intent of the programmer dripping down through geological layers of Java to a byte-code interpreter – and that’s just the start of this slow motion avalanche: electric power is being converted into waste heat with appalling recklessness.

I’m not sold on the “Fawn” model, but the problem analysis is right on the button.

sequential and combinational state machines

There’s a correspondence between the notions of “combinational” and “sequential” in digital circuit engineering and some structure in state machines (and therefore monoids)  that seems interesting.

In digital logic, a “combinational” circuit like a logic gate has can be associated with a time T so that the output depends only on input signals applied over the last T time units. If the propagation delay is T, then signals applied T+n time units ago are irrelevant to current state.  For example, an AND gate with propagation delay T has output 1 if all inputs have been set HIGH over the last T time units. For a sequential circuit, like a latch, there is no magic time T. Suppose we set the stored value in the  latch at time T1 and then signal it to hold the value. No matter how long the latch is held, we have to go back to absolute time T1 to find out what is in the latch. It’s worth noting however, that it’s often very useful to make a sequential circuit have a reset mode where it behaves like a combinatorial circuit. That is, we want to be able to say: no matter what previous signals have done to a memory device, some sequence of signals will clear it and, for example,  put the number 3 in it. More on this below.

Take a Moore machine M and write State(M,w) for the state reached by following input string “w” from the initial state. I’m limiting us to complete Moore machines where State(M,w) must be defined for every w.  Say M is combinational if there is some T so that if w=uz and length(z)>T then State(M,w)= State(M,z) . So the idea is that there is a T so that the last T inputs determine state for M. Call T the “depth” of M. As an example, suppose M represents a memory location and the inputs are just data values to store. Then we only need the last input, T=1.  If M is an AND gate with two pins and a delay of 2, then we can think of inputs as numbers 0,1,2,3 representing their 2 bit encodings: 0 for 00, 1 for 01, 2 for 10 and 3 for 11.  States can be pairs [x1,x2] representing the last two inputs so that if the state is [a,b] and input “c” is received, the new state is [c,b].  The state is actually a shift register. Suppose that the states are all pairs [x1,x2] where x1 and x2 are in the set {-1,0,1,2,3} with “-1” representing  an unknown signal. Then [3,3]  has output 1, [x1,x2] where both x1 and x2 are greater than -1 and less than 3 outputs 0 and the others have undefined output. The initial state is [-1,-1]. Obviously this machine is combinational with depth T=2. The intuition here is that combinatorial state machines will always “swim” to a fixed state given a fixed sequence of inputs of sufficient length. Obviously, this is a useful property of a device and even for sequential devices we are going to want some single or sequence of “reset” operations that, no matter what the current state, will force a reset to a known state. That is, for our latch, we’d want to force it to an assigned latched value, no matter what, if anything, it had previously latched. So properly designed sequential circuits are going to look like some combination of combinatorial and sequential circuit.

Now consider products of state machines. The method I’ve been using for representing state machines as maps f:InputSequences ->Outputs is convenient for products. If f represents a state machine M, then f(w) = Output(State(M,w)). If f represents a combinational state machine of depth T then f(wuz)= f(uz) whenever length(u)>T because State(M,wu)=State(M,u) so State(M,wuz)=State(M,uz) so the outputs must be the same.  Say F(w) = (g1(w1) …, gn(wn)) is a product under connection function Phi and map w |-> w1 … ,wn if when w=empty then wi = empty for all i and if when w maps to wi then wa maps to wi.Phi(a,i,F(w)) with ‘.’ representing string concatenation. Say that F is a cascade product if and only if  Phi(a,i,(x1 … xn)) depends only on xj for j < i. That is, in a cascade product, information only flows from lower numbered factors to higher numbered ones: there is no feedback in a cascade product. If g1 …,gn are combinatorial AND Phi(a,i,(x1 …,xn)) !=empty for any combination of parameters, and F is a cascade product, then F must be combinatorial. For proof, suppose Ti is the depth of each component gi and suppose w=uv where length(v)> T1. Then w1= u1.v1 where length(v1)> T1 so g1(v1)=g1(w1) and more than that g1(w1.z)=g1(v1.z) for all z.  That means Phi(a,i,F(wa))=Phi(a,F(va)) by the cascade property. And so on – eventually we see that the depth of F is the sum of the depths of the factors! For intuition, first we need enough inputs to steer factor 1 to a known state, then factor 2 and so on.

Given a state machine M with states S, we can associate a map from S to S with every string of inputs w so that F(M,w)(s)=s’ if and only if w leads from s to s’. This set of maps forms a monoid under ordinary function compositions  and since F(M,w)(F(z)(s))= F(M,wz)(s) then F(M,w)+F(M,z)= F(M,wz) where + is the monoid operation and not anything necessarily related to arithmetic. The empty string produces the monoid identity F(M,empty)(s)=s.  Look at the monoid structure of combinatorial state machines and note that unless the monoid is trivial, then  F(M,w)=F(M,empty) only if w=empty – there are no other identities.

The proof is easy. Suppose M is combinatorial  with depth T, the monoid is not trivial so that for some v we have F(M,v) != F(M,empty)and suppose F(M,z)=F(M,empty).  Then if z is not empty let Z consist of a string of concatenated z’s that is longer than T  so Z=zzzzzz …. stretched out to a length greater than T. By the combinatorial property, for any u, F(M,uZ)(s)=F(M,Z)(s) but because F(M,z)=F(M,empty) we also know F(M,uZ)(s)=s so F(M,w)(s)=s for all w and s which implies the monoid is trivial after all. It’s also worth pointing out that the combinatorial monoids seem to be a proper subset of the aperiodic monoids ( http://www.liafa.jussieu.fr/~jep/PDF/HandBook.pdf ). Consider the state machine with states {0,1,2} and inputs {a,b} so that, treating the inputs at maps from state to state, a(0)=1 and b(0)=2 for any non-zero state we have x(s)=s. The monoid is clearly aperiodic since a2=a 3 , a2(0)=1 and a2(s!=0)=s. But the state machine is not combinatorial.

Well, more later, but that’s what I have semi-clearly explicated for now. I’ll leave you with this photograph of Camille Jordan who lived in an unenlightened era that did not make clear demarcations between mathematics and mere engineering.