## Queues and algebra

Suppose we have a state machine Q, that implements a common first in first out queue. The input alphabet of Q consists of “Deq” and “Enq x” where “x” ranges over a set of values, say, V. Let’s fix the max length of Q at some constant k. As usual, given a sequence of events “w”, write Q(w) for the output of Q in the state reached by following “w” from the initial state. For the empty string NULL, define Q(NULL)=(). If a=Deq then define Q(wa) depending on Q(w) – if Q(w)=() or Q(w)=(x) then Q(wa)=(). If Q(w)=(x1 .. xn) then define Q(wa)=( x1 …). If a=Enq x then if Q(w)=() define Q(wa)=(x), if Q(w)=(x1 .. xn) where n<k define Q(wa)=(x,x1 … xn) and otherwise leave Q(wa)=Q(w). The number of states of Q is a function of the number of elements of V and the constant k and grows rapidly as they grow – sum[i=1, k](|V|^i  ).

If you visualize the graph of Q, you can see it contains non-trivial loops – which means that the monoid associated with Q contains groups. The advantage of decomposing Q into a product of smaller machines is that this might guide a smarter implementation. Well, I’m kind of cheating here, because programmers have found a variety of smart implementations that do strip out the group structure.

The easy decomposition produces two counters, and a collection of k storage elements that are loop free state machines: an input “store x” moves the machine to a “x” state. They do contain loops but only trivial ones. The counters are group machines. We number the storage cells 0 to k-1.  We can have one mod k counter that “points” to the tail of the queue and one that counts the size of the queue. On an enq operation the enqueued value is stored in the storage element identified by subtracting the size from the tail mod k,   and the size is increment mod k+1. An enq operation when the size=k is ignored and has no effect. On a deq, the size is decremented and the tail is incremented by one mod k+1. What we’ve done is construct a system in which the state machine monoids are two cyclic groups and k-aperiodic monoids. If we then switch to more complicated data structures, like ordered lists, it’s not that hard to see how a permutation group might be used to manipulate pointers to a pile of storage locations. For example (n1,n2,n3,n4,blank, n5,n6) might be an ordered list with n5 and n6 being free. An enqueue would generate some shuffle to say, (n1,n2,n5,n3,n4,blank,n6) and then a dequeue would produce (n2,n5,n3,n4,blank, n6,n1).

A nice property of groups is that “undo” always works. So if the data in the storage elements can be marked in some way, a deq or enq can be undone by subtracting mod k or permuting backwards.

The connection between group structure and pipeline design seems like it merits a lot more attention than it gets. It’s not too hard to show that in a pipeline like the one to the right, the induced monoid of M1 must be a homomorphic image of the composite state machine. This immediately brings in the limiting factors of simple groups.

KR theory is too vague though. For example, I’d like to know something about the relationship of M and M’ in the case that the induced monoids are isomorphic. They definitely do not have to be the same, but seems like they should be similar enough so that changing the output map would equalize.

## The Amory Lovins bottleneck

Lovins observes that power inputs in many industrial processes go into a bottleneck that makes power conservation hard if you start at the wrong end.  The power goes into a long pipeline of process that emerges on the other end with some useful (in theory) work. If you start on the power input end, then reducing power x% requires percolating incremental improvements down the chain of linked machinery with each step reducing work at the step further down the pipeline. But if you start on the other end, changes automatically flow upward. The same, obviously, holds true for data centers. If you start by improving power efficiency of air-conditioning – a good thing in itself – you cannot obtain the scale improvements that can be gained on the other end of the pipeline by reducing the activities that use power and generate heat. That is, if you can increase work-done/computational-steps you drive savings up the pipeline. And the kind of large scale savings Lovins achieves in other industrial processes seem plausible: if you reduce power demand at the work end enough to reduce the inputs of cooling needed so that a smaller air conditioning unit can be used, you have a potentially greater savings than by improving the efficiency of the air conditioning unit.

## An interesting article in ACM communications: is the world ending?

Coverity has a program that reads other programs looking for errors. The company started as a research project from Stanford (how unusual!)  and the  Communications article is really about what they found in commercial world. One thing they found was a lot of crappy programmers.

Upon seeing an error report saying the following loop body was dead code

“No, that’s a false positive; a loop executes at least once.”

Another thing they found is that “it’s more of a guideline than a rule” is a common theory among compiler writers – something that is an interesting problem when your program has to parse code to understand it. Microsoft’s “precompiled” header files feature is a great example of a non-value add. Here’s an amazing observation:

On the other hand, C# and Java have been easier, since we analyze the bytecode they compile to rather than their source.

What that says about the state of the art in abstract specification of programming languages is important. Perhaps byte code is a better target for all sorts of other program transformations including efforts to extract parallelism.

The article also gives a flavor of what its like to fall out of the university programming environment and wander across the strange lands of commercial practice where fossils still walk, unsolvable problems are routinely solved, and solved problems are impossible.

I liked the advice to make presentations to as large a group as possible in the interests of attracting at least one smart person who will get it. There’s nothing more  despiriting than a presentation to a technical group that is in that sort of sullen defensive mood in which lousy practice and ignorance is a fortress of job protection.

## Toyota's problem: hardware weenies and poor accounting practices [updated]

Jamie Kitman’s look at the twisted path Toyota followed to it’s current difficulties inspired me to think about software and money – two topics I spend way too much time thinking about. As a purely disinterested observer (ahem) it has come to my attention, repeatedly, that manufacturing companies undervalue, underinvest in, and undertest software.  On a sales visit to a manufacturer of giant size industrial air-conditioners, I was told that some software improvements had reduced operating cost 25% but that purchasing software tools or hiring advanced consulting or top line engineers internally was out of the question. The software team was a capable enough group, but it did not contain anyone who could provide architectural perspective, help with design issues, understand tradeoffs involving OS/processor/board and so on so each project was a one-off desperate attempt to patch the existing undesign. The reason for this absence was simple: top level software engineers could not be hired for \$40K/year.  The team manager sadly told me that accounting was not willing to place high value on weightless, invisible code and so the usual rules for R&D for inputs to the manufacturing process gave their team nearly nothing.

As a visitor to Japanese auto companies 10 years ago, I was blown away by the brilliance of the industrial design and the enormous investment in that, and shocked at the software development effort – one in which the software team was clearly considered to be low status. Not that my, more limited, experience with US companies was much more impressive. In fact, it was interesting that as Japanese auto companies were reaching out to at least look at advanced control software, US companies relegated that work to the outer periphery. Of course, this was some time ago, but obviously a \$100M investment in top level control software development and test at Toyota a decade ago would have been a good investment – even if they had not given any of it to me.

Once I was visiting an office products manufacturer and discussing their new project. The hardware team had chosen to use a processor for which there was no solid compiler suite, no existing strong web or network or other components etc. because it was cheap. I explained that the choice of processor was fine with us, but that our prices for providing basic RTOS support would be high due to the necessity of compensating for this environmental limitation. The team broke into a raucous argument between the software developers and the hardware engineers – with the software people basically yelling “we told you idiots!”.

One of the things that struck me at a Japanese auto company was how the manufacturing discipline that allowed them to make such high quality hardware was not even attempted with software. Software is still not taken seriously as a manufacturing area – and the consequences are not happy ones. See the article by founders of Coverity for a great tour of the strange views tolerated within software development

Do bugs matter? Companies buy bug-finding tools because they see bugs as bad. However, not everyone agrees that bugs matter. The following event has occurred during numerous trials. The tool finds a clear, ugly error (memory corruption or use-after-free) in important code, and the interaction with the customer goes like thus:

“So?”

“Isn’t that bad? What happens if you hit it?”

“Oh, it’ll crash. We’ll get a call.” [Shrug.]

## Recursion and state update and Hungarian Mathematicians

Please see this ever modifying page which seeks to demonstrate in a poorly organized, yet humorous manner, what the hell I’m trying to accomplish with this primitive recursive state machine stuff.

And as an added bonus – the great “Recursive Functions” book by Rozsa Peters contains the following remarkable note that has nothing to do with the mathematics, but a lot to do with how mathematicians think.

Ok, “the present conditions” were kind of dramatic.

At least 1,000 Soviet tanks are reported to have entered Budapest and troops deployed throughout the country are battling with Hungarian forces for strategic positions.
The Soviet air force has bombed part of the Hungarian capital, Budapest, and Russian troops have poured into the city in a massive dawn offensive.

## Instructions per joule – a good start

… multi-GHz superscalar quad-core processors can execute approximately 100 million instructions per Joule, assuming all cores are active and avoid stalls or mispredictions. Lower-frequency in-order CPUs, in contrast, can provide over 1 billion instructions per Joule—an order of magnitude more efficient while still running at 1/3rd the frequency. Worse yet, running fast processors below their full capacity draws a disproportionate amount of power:

Dynamic power scaling on traditional systems is surprisingly inefficient. A primary energy-saving benefit of dynamic voltage and frequency scaling (DVFS) was its ability to reduce voltage as it reduced frequency [56], but modern CPUs already operate near minimum voltage at the highest frequencies. Even if processor energy was completely proportional to load, non-CPU components such as memory, motherboards, and power supplies have begun to dominate energy consumption [3], requiring that all components be scaled back with demand. As a result, running a modern, DVFS-enabled system at 20% of its capacity may still consume over 50% of its peak power [52]. Despite improved power scaling technology, systems remain most energy-efficient when operating at peak utilization

Andersen, D. G., Franklin, J., Kaminsky, M., Phanishayee, A., Tan, L., and Vasudevan, V. 2009. FAWN: a fast array of wimpy nodes. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (Big Sky, Montana, USA, October 11 – 14, 2009). SOSP ’09. ACM, New York, NY, 1-14. DOI= http://doi.acm.org/10.1145/1629575.1629577

Once you start thinking about instructions/joule, you need to also think about work/instruction. This number is going to make current system look a lot worse than even the dismal summary above because the immense amount of overhead is so overwhelming. It’s not even just OS and Virtual machine overhead, as horrible as that it. Consider a Java program. Anyone who has ever looked at production Java code can testify that the number of layers of “frameworks” and “libraries” and so on between a program and actual computation is not small. So consider a program that reads data from the network, processes, stores in a DB, and sends responses. This is not an unusual program model. Ok: we begin with an interrupt that must navigate the virtual machine layer to some OS that most likely contains code that replicates most of the functionality of the VM layer, then  layers of protocol fiddling, the sad fumbling over locks in the a “fine grained locking” OS, the clumsy process model and atrocious scheduling left over from the days in which CPUs were scarce resources, the IO layer (DEAR GOD!!), the pointless copying of every bit of data over and over again, and then the agonizingly bumbling progress of the intent of the programmer dripping down through geological layers of Java to a byte-code interpreter – and that’s just the start of this slow motion avalanche: electric power is being converted into waste heat with appalling recklessness.

I’m not sold on the “Fawn” model, but the problem analysis is right on the button.