undefined behavior and the purpose of C

The C programming language, by design, permits programmers to work  at both low and high levels. In particular, for applications like operating system development,  cryptography,  numerical methods, and network processing, programmers can go back and forth between treating data as representing abstract types and as machine addressable sequences of bits, between abstract operations and low level, machine dependent operations. Capturing this sophisticated approach in a formal standard is not simple, and often the writers of the ISO C Standard have thrown up their hands and labeled the effects of non-portable and potentially non-portable operations  “undefined behavior” for which they provided only a fuzzy guideline.  Unfortunately, the managers of the gcc and clang C compilers have increasingly ignored the guideline and ignored well-established well-understood practice, producing  often bizarre and dangerous results and a flock of complaints. 

  • There was a CERT security alert.
  • Extremely negative reviews from the Linux developers (“the C standard is _clearly_ bogus shit” – Linus Torvalds)   which made GCC developers  provide an ad-hoc collection of compiler flags for an unstable non-Standard dialect of C.
  • A warning from the chief Clang architect that compiler changes  mean that “huge bodies of C code are land mines just waiting to explode” (followed later by a suggestion that programmers switch to other languages).
  • paper from the foremost researcher in cryptography asking for a version of C that would be “safe” from dangerous “optimizations”.
  • A well thought out and completely ignored  academic proposal for a “friendly” C compiler and standard. 
  • A less polite academic analysis, also ignored. 

As an example of what provoked these reactions, consider a small C program that, without a warning, when passed though the current gcc and Clang highest level optimizers can’t recognize -2147483648 as a negative number. The same program  works correctly when compiled at lower levels of “optimization”, or with optimizations by earlier versions of clang and gcc or even currently by the Intel C compiler.  And consider the following, idiomatic C code, transformed into an infinite loop that never gets to emergency shutdown.

//keep opening valve until pressure down or do emergencyshutdown
for(int i=0; i >=0; i++){ //try it a bunch of times
if ( getpressure() > DANGER)openvalve();
if (getpressure() > DANGER)emergency_shutdown();

You can see how this works in the gcc8 generated machine code for the x86 – a processor family that natively wraps arithmetic overflow to do the right thing. The increment, the check, and the call to emergency shutdown are all deleted by the compiler – as an “optimization”. 

/Machine code output of GCC8 with O3 optimization level.
sub rsp, 8
call getpressure
cmp eax, 10
jle .L4
call openvalve
jmp .L4

Of course, this code is perhaps not the best C code, but it accords with the customary and expected behavior of the language over decades of practice. Earlier versions of Gcc compile the code correctly as does the Intel compiler. So tested, working safety critical code, moved to a new version of the same compiler will fail with no warning.

The ISO C Standards Committee is oblivious 

When I visited the ISO WG14 C Standards committee meeting in Pittsburgh this winter to ask for some corrective action, I found a peculiar lack of urgency.  The most interesting comments were from a long-time committee member who seemed to believe that C had become a hopeless cause and from another attendee who suggested that fixing contradictions in the standard was not urgent because there are so many of them. The gcc specific  carve-out that makes Linux development even possible is  unstable and fragile – relying on often poorly documented escapes that will be constantly under pressure from the design trajectory of both gcc and clang compilers (which appear to be excessively focused on artificial benchmarks).  As more sophisticated compilation methods, such as link time optimizations, become more prevalent, the opportunities for catastrophic silent transformations will proliferate.

Already, gcc has created a serious security violation in Linux by silently deleting an explicit programmer check for a null pointer (after which both GCC and Clang were forced to add a specific flag disabling that “optimization”). The Linux approach of stumbling into dangerous “optimizations” and then protecting against them is neither reliable not extensible to projects with fewer or less sophisticated users/testers. Given institutional unwillingness to think about the problem at either the standards or compiler levels, some outside intervention is necessary if anything is going to be fixed. Fortunately, the Standard can be mostly repaired by a very simple change (although it would be good if there was some continuing active involvement in the Standard from people involved in operating systems development).  

  1. The ISO C standard includes a long standing, well intentioned but botched effort to strengthen C type system in a way that has produced a contradictory and complex set of rules under which a vast body of C code can be considered to embody “undefined behavior” (often with no remedy in the Standard).  Not only is is not possible to access “device memory”, but it is impossible to write a memory allocator let alone a virtual memory system, check for arithmetic overflow, reliably check for null or invalid pointers, checksum network packets,  encrypt arbitrary data,  catch hardware interrupts, copy arbitrary data structures, use type punning, or, most likely,  to implement threads. Application programs written in C face the same problems and worse. For example, the standard POSIX function  “mmap” appears to be intrinsically “undefined behavior”.  Everything, however, worked for decades until recent compiler changes.
  2. The primary open source compiler developer groups have seized on the sloppy definition of undefined behavior in the Standard to justify a series of increasingly dangerous silent code transformations that break working code. For reasons explained below, I will call these transformations  Beyond Guideline Transformations (BGTs). 

The innovation: Beyond the Guideline Transformations 

The key passage in the C11 Standard is the following (my highlight in red)

undefined behavior behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements
  • 2 NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
  • 3 EXAMPLE An example of undefined behavior is the behavior on integer overflow.

The compiler developers have seized on “imposes no requirements” to claim that they are entitled to do anything, no matter how destructive, when they encounter undefined behavior even though the guideline in Note 2 captures the customary interpretation.  At one time, it was generally understood that the undefined status of signed integer overflow meant that the native arithmetic operations of the target processor determined the result: so that often  “i+ j” in C can be compiled to a single machine instruction.  If the result of the addition overflows the processor representation of an signed integer – anything could happen -the burden was on the programmer, processor, and execution environment to detect and handle the problem.  Almost every modern mainstream processor simply rolls over as expected, although there are some processors that generate traps and some embedded devices with things as odd as saturating arithmetic.  None of that is a problem under the customary compilation of the language or within the suggested guideline of “Note 2”. The compiler can just leave it up to the underlying machine, the execution environment, and the programmer to sort out.  But, Gcc/Clang developers, with the tacit acquiescence of the ISO standards committee, now claim the compiler is free to assume undefined behavior “can’t happen” and that the compilers can silently remove programmer checks for it – even while generating code that produces the condition.

This, “beyond the guideline” (BTG) approach is relatively new and is still unknown to many practitioners and involves major conflicts with the ANSI C definition in the second edition Kernighan and Ritchie book: which is the definitive specification .   You can follow Linux’s lead and block the overflow  “optimization” with “frwapv”  which forces wrapping semantics (in some places) but that’s not even necessarily what programmers want and there are many other places where C code is vulnerable to similar transformations. For example, in machines with linear addressing, it is not unusual to check a pointer is in a range with code like:

if( p < startptr || p >= endptr)error()

This code is straightforward and easy to compile on the dominant processor architectures of the day, ARM, x86, MIPS, and similar. It might be a problem on, say, segmented architectures or elaborately typed architectures such as the not lamented Itanium.  It’s also possible that, depending on how the pointer “p” is generated that it might violate complex, contradictory,  and hard to understand type rules for pointer comparisons and fall into the “undefined behavior” bucket. In that case, the compiler developers claim that they can silently delete the check – after all, undefined behavior “can’t happen”. All the code that depends on these types of checks, checks that are reasonable C code and that have worked for decades, may encounter an unpleasant surprise at any future time when the compilers get around to “optimizing” those checks away. And, as usual, there is no guaranteed way in the Standard to “fix” this check. Suggestions involving converting pointers to integers run into other undefined behavior and the optional status of the required data type.

Since common sense has not sufficed, both the compiler development groups and the Standards need an intervention.  The decision to ignore the guideline is easily repaired, mostly by removing a paragraph mark – which in the arcane world of ISO standards, transforms a “non-normative”  guideline into a mandatory rule. 

undefined behavior behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements other than that : Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). It is not acceptable to change the semantics of correct program structures due to undefined behavior elsewhere. 

I’d add the following note as well

NOTE: Undefined behavior, particularly, due to non-portable constructs is a normal component of C programs.  Undefined behavior due to errors that the translator can detect should produce a diagnostic message.


Undefined behavior is, currently, unavoidable in the C standard

One of the common arguments made by some of the compiler developers who champion BTG transformations is based on a false dichotomy: that without license for BTG transformations, compilers would have to provide an intrusive run-time that guaranteed predictable results and the Standard would have to spell out every possible behavior. This speaks to a fundamental misunderstanding of the purpose and use of C. 

  • How can the compiler distinguish between an erroneous read of uninitialized memory which will produce unspecified results (or even a trap on some architectures) and a syntactically equal chunk of code that reads a memory mapped device control register which  cannot be written? The designers of C solved the problem by leaving it up to the programmer and the execution environment: the compiler produces a load operation, the semantics are not its problem – they are “undefined behavior”. 
  • How can the compiler or standard distinguish between a mistaken use of an integer pointer to access a floating point number and a numerical methods expert’s shortcut divide by 2 which involves changing the binary representation of the exponent? That’s not the responsibility of the compiler.
  • How can the compiler tell the difference between a null pointer dereference in user space where it will, usually, cause a trap on memory mapped systems, and one where the zero page is accessible memory? It cannot: the programmer and operating system can handle it. The best the compiler can do is warn.

What would be nice is if the compilers could warn about possible errors and if the Standard provided opt-in methods of locking down access methods and preventing common errors like array boundary and type violations.  C developers increasingly rely on sophisticated static analysis tools and test frameworks to detect common errors and have embraced methodologies to limit the kinds of haphazard programming that was common earlier. But all of that is very different from retroactive, silent, language changes based on opaque and often ill considered rules.

The developers of the C Standard were forced to add a hurried character pointer exception to their type rules when they discovered they had  inadvertently made it impossible to write the fundamental memcpy function in C. The exception makes the type rules ineffective while still being cumbersome ( Brian Kernighan made exactly this criticism of Pascal decades ago). And it certainly did not fix the problem.  For example, the Standard only makes malloc/free/calloc and similar work within the type system by treating them as special exceptions to the type rules (in a footnote, no less) and apparently nobody realized that those functions could not then be implemented in Standard C.  There is no excuse for compilers to punitively enforce such nonsense through BTG transformations.  

Clang architect Chris Lattner seems to believe there is nothing the compiler developers can do: 

The important and scary thing to realize is that just about *any* optimization based on undefined behavior can start being triggered on buggy code at any time in the future. Inlining, loop unrolling, memory promotion and other optimizations will keep getting better, and a significant part of their reason for existing is to expose secondary optimizations like the ones above. To me, this is deeply dissatisfying, partially because the compiler inevitably ends up getting blamed, but also because it means that huge bodies of C code are land mines just waiting to explode.

Lattner is absolutely correct about the “land mines” which this approach to “optimization” produces but he is wrong to imply that the compilers are required to adopt the BTG  approach to undefined behavior or that the code triggering undefined behavior is necessarily “buggy”.  C compilers could optimize on undefined behavior, as they always have, without violating the guidelines suggested in the standard: as Intel’s ICC still does. The choice to go beyond the guidelines is an engineering choice.

The invisible advantages of BTG optimizations 

Some of the defenders of BTG optimizations argue that code producing undefined behavior is not actually C code and so doesn’t have a meaning. That kind of moronic legalism has no support in the history of the language, the behavior of compilers (which have no trouble compiling and producing expected behavior from the same code prior to “optimization”) or the language of the standard – it’s just a pompous way of excusing crappy engineering.  The C-Standard is a train-wreck on its own, but it does not mandate the bizarre paradoxical undefined behavior GCC and Clang are choosing to haphazardly implement. 

The supposed motivation for BTG transformations is  that they enable optimizations. There are, however, only a bunch of dubious anecdotes to support the claim of great optimizations – not a single example of careful measurement.  And there are studies that show the opposite. Here’s the justification from one of the LLVM developers:

This behavior enables an analysis known as “Type-Based Alias Analysis” (TBAA) which is used by a broad range of memory access optimizations in the compiler, and can significantly improve performance of the generated code. For example, this rule allows clang to optimize this function:
float *P;
 void zero_array() {
   int i;
   for (i = 0; i < 10000; ++i)
     P[i] = 0.0f;
into “memset(P, 0, 40000)“. This optimization also allows many loads to be hoisted out of loops, common subexpressions to be eliminated, etc. This class of undefined behavior can be disabled by passing the -fno-strict-aliasing flag, which disallows this analysis. When this flag is passed, Clang is required to compile this loop into 10000 4-byte stores (which is several times slower), because it has to assume that it is possible for any of the stores to change the value of P, as in something like this:
int main() {
  P = (float*)&P;  // cast causes TBAA violation in zero_array.
This sort of type abuse is pretty uncommon, which is why the standard committee decided that the significant performance wins were worth the unexpected result for “reasonable” type casts.

This is a ridiculously contrived example, of course. The program, as written, makes no sense and will crash on any conceivable system.  If not setup to fail, any half-decent C programmer could have used a local variable parameter for the pointer (in which case -fno-strict-aliasing has no effect) or called memset directly. These are the kinds of things that C programmers find and fix using profiling.  Oddly, the compiler misses the big optimization, which is to do nothing – since nothing is done with the array before the program exits. So here we have the usual elements of an example of why C compilers need to make “unexpected” BTG results out of undefined behavior

  1. A minor or unnecessary optimization
  2. of a contrived example that shows nothing about actual applications
  3. that is possibly a undefined behavior optimization but not a BTG transformation
  4. that is accompanied by absolutely zero in the form of measurement
  5. that ignores actual optimization possibilities.

A  talk by Google’s Clang expert, Chandler Carruth on this topic has an even worse example.  Carruth gives an example of array indexing that has all of the first 4  characteristics above, plus the benefit that if you look at the generated code, the supposedly much better form taking advantage of undefined behavior for optimizations is not really any better than the original.  He also explains how the compiler cannot determine whether programs are correct (!) and makes an unfortunate analogy between compiler behavior and APIs. Imagine if operating systems, for example, were to caution developers that “anything can happen” if the parameters passed to an OS system call are not as expected. Programming languages should minimize surprise.

In the real world, the justification for BTG transformations is convenience for the compilers – which are increasingly driven by not only artificial benchmarks, but by their support for C++ and Java which are languages that have very different semantics. Operating on intermediate codes that are stripped of C specific semantics, it is difficult for the compilers to safely apply optimization passes without making dangerous C transformations. Rather than addressing the weakness of this internal representation of code, it’s easier to find loopholes that allow the customary semantics to be ignored.


C is not Java

Here’s something more from Lattner. 

Violating Type Rules: It is undefined behavior to cast an int* to a float* and dereference it (accessing the “int” as if it were a “float”). C requires that these sorts of type conversions happen through memcpy: using pointer casts is not correct and undefined behavior results. The rules for this are quite nuanced and I don’t want to go into the details here (there is an exception for char*, vectors have special properties, unions change things, etc). This behavior enables an analysis known as “Type-Based Alias Analysis” (TBAA) which is used by a broad range of memory access optimizations in the compiler, and can significantly improve performance of the generated code.

To me: “The rules for this are quite nuanced and I don’t want to go into the details here (there is an exception for char*, vectors have special properties, unions change things, etc)” means, “we mucked up the standard and we are going to cause many systems to fail as these nuanced rules confuse and surprise otherwise careful and highly expert programmers”.   Unexpected results are a catastrophe for a C programmer. Limitations on compiler optimizations  based on second guessing the programmer are not catastrophes ( and nothing prevents compiler writers from adding suggestions about optimizations). Programmers who want the compiler to optimize their algorithms using clever transformations should use programming languages that are better suited to large scale compiler transformations where type information is clear indication of purpose. As Chris Lattner notes:

It is worth pointing out that Java gets the benefits of type-based optimizations without these drawbacks because it doesn’t have unsafe pointer casting in the language at all.

Java hides the memory model from the programmer to make programming safer and also to permit compilers to do clever transformations because the programmer is not permitted to interact with the low level system.  Optimization strategies for the C compiler should take into account that C does not put the programmer inside a nice abstract machine.  The  C compiler doesn’t need to force C to become a 5th rate Pascal or Swift.  Unexpected results are far more serious a problem for C than missed minor optimizations.

Sponsored Post Learn from the experts: Create a successful blog with our brand new courseThe WordPress.com Blog

Are you new to blogging, and do you want step-by-step guidance on how to publish and grow your blog? Learn more about our new Blogging for Beginners course and get 50% off through December 10th.

WordPress.com is excited to announce our newest offering: a course just for beginning bloggers where you’ll learn everything you need to know about blogging from the most trusted experts in the industry. We have helped millions of blogs get up and running, we know what works, and we want you to to know everything we know. This course provides all the fundamental skills and inspiration you need to get your blog started, an interactive community forum, and content updated annually.

Sorting algorithms as recursive functions

This paper is an experiment in presenting programming algorithms as recursive functions, without pseudo-code or genuine code. The algorithms presented are the standard basic sorting algorithms with some computational complexity analysis. The style is that of ordinary working mathematics although the case statements of the recursive functions look a lot like pattern matching.



[pdf-embedder url=”https://yodaiken.files.wordpress.com/2020/10/fa700-sort-1.pdf”%5D

Large scale state machines

Computer system designs and implementations, even not very big ones, are often difficult to understand thanks to
enormous state sets and complex interactions between connected components (which may or may not change state
independently). While many algorithms can be satisfactorily described using a combination of informal math and
pseudo-code (see [Knu97] for example), operating systems, networks, distributed consensus, computer architectures
and other systems that involve complex state are more of a challenge. State machines are a good model, but the
familiar ways to define state machines don’t scale. A state table or transition graph can only be so big before it’s
not comprehensible. These methods are also too specific, too detailed, without straightforward ways to abstract or
parameterize. For example, if we draw a state diagram for a mod 5 counter, it doesn’t help with a mod 500 counter.
As an alternative, deterministic state machines can be represented as maps f : A* → X where A is a finite set of
discrete events (an alphabet of events) and A is, as usual, the set of finite sequences over A.


[pdf-embedder url=”https://yodaiken.files.wordpress.com/2020/10/b77d0-brief_informal.pdf”%5D

aperiodic and cyclic state maps

Some rough notes on modularity, composite state maps, cascades and aperiodics.

The larger question is how to factor discrete systems into modules. Suppose (f:E^*to X) where (E^*) is the set of finite sequences over (E). This question is then about the structure of (g(s)= h(g_1(u_1),dots g_n(u_n)))  so that (g(s)=f(s)) and the (u_i) are dependent sequences as defined previously.  To start with something simple, suppose, (f(s)) counts events mod (k) for some (k > 1).

[fepsilon = 0quad f(s.e)= f(s)+1bmod k]

where (epsilon) is the empty sequence and (s.e) is the sequence obtained by appending event (e) on the right to sequence (s). These two equations define (f) on all elements of (E^*). One question is whether (f) can be constructed from 2-state components as a cascade product  even though it cycles. The answer is obvious to anyone who has seen a carry-lookahead adder or possibly to anyone who has seen Lagranges theorem in group theory or the Krohn-Rhodes theorem. If (k= 2^n) for some (n>1) then the answer is “yes, (n) binary-state components in series (for the carry) do the job (the underlying group for (f) then is (Z_k) ).

Let (E_2={0,1}) and  (beta(epsilon) =0 ) and (beta(s.e) = beta(s)+ebmod 2). Then let (g(s) = Sigma_{i=1}^n 2^{i-1}*beta(u_i(s)))) where (u_iepsilon =epsilon) as usual, (u_1(s.e)=u_1(s).1) and

[ mbox{for }i>1, u_i(s.e) = u_i(s).xmbox{ where }x = Pi_{j=1}^{i-1}f_{j}(u_{j}(s)))]

Then (g(s) = f(s)) and the (x) above is the computed carry.   But the implication is that if, for example (k) is prime, then we need a cycle somewhere. If that’s right, it’s amazing.  Number theory is the king of all.

Aperiodic state maps, for example, ones that do saturation counting (0,1,2,dots n,n,ndots) should also factor according the the Krohn-Rhodes insight. But in that case, I don’t see good “modular” decomposition. Say (f) is aperiodic iff there is some (n) so that for any sequence (s) of length (n) or more and any event (e) in the alphabet of (f), it must be that (f s = f(s.e)).  Then let (E^f) be the finite set of all sequences over (E) of length (n) or less and let (h(s) = f(s)) for (sin E^f).  We define a store machine so that

[ store(epsilon)notin E,quad store(s.e) = e mbox{if }store(s)notin embox{ and }store(s.e)=store(s)mbox{ otherwise}]

Let (g(s) = h(concat(store(u_1(s)),dots, store(u_n(s))))).  Set (u_1(s)=s). Set (u_{i+1}(s.e) = u_{i+1}(s)) if (beta(u_i(s)) notin E) and (u_{i+1}(s.e) = u_{i+1}(s).e) otherwise. Then the factors “remember” the first (n) events.  But each component has (|E|+1) possible states and so, even though these “store” machines are themselves aperiodic and can be easily reduced to ( log_2(|E|+1)+1 ) bits, the reduction is hard to prize.



The halting problem has nothing to do with computers


“Turing’s work was of course a great contribution to the world of mathematics, but there is a question of exactly how it is related to the world of computing. – Wilkes

The undecidability of the halting problem for Turing machines is not relevant to computer programming and provides no information about whether there can be computer programs that automatically check the correctness of other computer programs. Knowing that some problem is Turing decidable or Turing undecidable doesn’t mean you know anything about how hard it is for actual computer programs on physical computers.

  1. A problem can be Turing decidable, even trivially Turing decidable, but impossible or unfeasible for computer programs on physical computers.
  2. A problem can be Turing undecidable but solvable by computers in practice.
  3. Computer programs on physical computers are finite state, so their halting problems and correctness are trivially decidable by Turing machines.

Much of the confusion comes from sloppy definitions of “program” and “halting problem”.  Martin Davis appears to have coined the term halting problem to refer to the problem of deciding whether a Turing machine ever halts when started with a particular configuration.  That holting problem is not Turing decidable. There is  no Turing machine which given input ((n,m)) is guaranteed to decide whether Turing machine (T_n) with input (m) will eventually halt.  But there are Turing machines  which  take input ((s,n,m)) and decide if (T_n) halts within (s) steps (or (2^s) steps or (10^{10^{100000times s}})) steps. It is Turing trivial to determine whether a Turing machine halts within the expected lifespan of the Milky Way given the most optimistic non-zero time for each step to complete.  For most purposes, a Turing machine that takes galactic time to complete a computation never completes it.

Even in foundations of mathematics, the difference between “never halts” and “takes too long” is not always important.  Kurt Gödel wrote that if decision processes worked in linear or exponential time, “this would have consequences of the greatest importance . Namely, it would obviously mean that in spite of the undecidability of the Entscheidungsproblem, the mental work of a mathematician concerning Yes-or-No questions could be completely replaced by a machine. After all, one would simply have to choose the natural number n so large that when the machine does not deliver a result, it makes no sense to think more about the problem.”   

It’s trivially Turing decidable whether a Turing machine can complete within some time using some fixed number of states but is it feasible for computer programs to decide that?  It seems hard. The two criteria: “feasible for computer programs” and “Turing decidable” are very different.  However, the critical importance of the halting problem for Turing machines in computer science is folkloric and anyone bringing up these pesky facts will likely get one of the standard confused responses

  1. Yes, actual computers are finite state, but golly they have a lot of states so it’s pretty much the same thing (nope).
  2. If someone keeps adding or swapping out CDs or disk drives, actual computers are really infinite state (that is a halting problem for a system comprised of a computer, an immortal person to swap CDs, infinite matter, and unlimited time).
  3. Turing machines are good models of real programs (perhaps, but the halting problem is not relevant to that use.).
  4. Python, or Lisp or Haskell or something, have unbounded integers. (No they do not.)
  5. etc.

Much of the confusion is caused by mixing-up informal explanations with mathematics. Turing’s Entscheidungsproblem  paper proposes a specific model of “computable”, arguing that the informal notion of “algorithm” in mathematics can be formalized by a type of automata that is now known as Turing machine.  Once this notion is made precise, it is possible to look at the properties of these algorithms and to prove some theorems about what Turing machines can and cannot compute. But “Turing machines” are not physical computers.  The first sentence of the current Wikipedia page on “Halting Problem” is a good example of imprecision leading to error:

“In computability theory, the halting problem is the problem of determining, from a description of an arbitrary computer program and an input, whether the program will finish running (i.e., halt) or continue to run forever. “

The definition of  “computability theory” linked in the text starts “Computability theory, also known as recursion theory“, firmly placing the topic in metamathematics. But the definition of “computer program” linked in the text is concerned with programs on physical digital computers. Determining whether such a program halts is Turing Trivial yet the next sentence in the Wikipedia article jumps back to Turing machines:

Alan Turing proved in 1936 that a general algorithm to solve the halting problem for all possible program-input pairs cannot exist”.

So we’ve jumped from recursion theory to  computer programs on physical computers to calling Turing machines “programs”. And then a jump back:

“The theoretical conclusion of not solvable is significant to practical computing efforts, defining a class of applications which no programming invention can possibly perform perfectly.”

Nope. Similar sounding words don’t necessarily have the same meaning. A mathematical proof about the limitations of “computable functions”  should not be handwaved around to show something about physical computers and programs running on those computers.


Semiotics of functional programming


In contrast to the usual imperative languages (e.g., C, Fortran, and Ada), in which variables represent cells in memory that can be modified using the assignment operator = (or  :=), functional languages view the use of the = operator as an expression of an equation. For example, if a functional program contains the declaration let x = f(y) then this would introduce the name x and assert that the equation x = f(y) is true. There is no notion of a memory cell, and certainly no notion that somehow later in the program x might change (so that the equation would become false) –  Goldberg, Functional Programming Languages, AMC Computing Surveys, March 1996.

Here’s a common mathematical expression

(y = Sigma_{x=1}^{x=10}x)

The value of “x” changes from 1 to 10. Here’s part of the description from the great text by Aleksandrov, Kolmmogrov, and Lavrent’ev survey of mathematics chapter on Approximate Calculation of Roots.

gamma_1 = - frac{(b-a)fa(a)}{f(b)-f(a) +a }.
Taking this number for the new b in cases I and II and for the new a in cases II and IV we find …

This “new b” is not the same as the old b. And it’s not just in Soviet era Rusian math texts that we find such heretical material. Even in Abstract Algebra texts, we can find texts where “Let x be an element of the group G”  later is contradicted by “Let x be an element of the ring R” or, worse, where “x = 0” is later contradicted by “x ≠  0” .  Here’s more Functional Programming Doctrine:

As in mathematics, a function is a single-valued relation such that, given the same argument(s), it will return the same result. This is certainly not the case in the imperative programs.

Are functions single valued relations in mathematics? I guess, if we consider sets, vectors, and matrices and other collections to be single values. But given the same arguments, do they always return the same result? Not really:  f(x) = mx+b depends on the parameters “m” and “b” as well as on the argument “x”.   Consider the C function “f” defined by:

int f(int x){ static int b = 0; return x+ (b++); }

So ” x = f(1)+f(1)” is definitely not equal to “2*f(1)”  (it is equal to 3 )*. You could look at this as: “C is a terrible construct of math illiterates who probably didn’t even know how to spell monod”,  or you could see it as:  parameter b changes between invocations of “f” . In fact, you could look at the original crime in FORTRAN where “x = 2” was introduced as a statement and think: this is a rational convention in response to both the absence of subscripts on teletype machines and the inability of computers to understand what “…” means in  expressions like (x_1 =2; x_{2} = x_1^2, …. x_n^n ) .   Or consider a recursive function

[ f(x) = begin{cases}1 &mbox{if }xleq 0\
x* f(x-1) &mbox{otherwise}
end{cases} ]

If we calculate this for (f(2)), we get (x =2) and then (x = 1) and then (x = 0). Which is it? The answer is that “x” takes on different values as the calculation changes state!

So – as Socrates said – part of the problem here is that the differences between (a) axiomatic (declarative) definitions and (b) algorithmic definitions and (c) non-mathematical definitions, are not properly acknowledged in much of the functional programming literature.  Calculations involve state change – of something. For most of the 20th century a “computer” was a person who calculated (computed).  Like Katherine Jonson, Dorothy Vaughan,  and Mary Jackson at NASA  they carried out engineering and science (and cryptography) calculations before digital computers were widely available. Imagine such a human computer using  Euler’s method. The “state of the computer” (the person doing the calculation) changes during the calculation as she iterates through (and possibly writes down intermediate results on paper). Similarly, electronic computers change state as they compute. Nothing to be afraid of here.

Not that the FP critique of imperative programming is all wrong.  Functions that only modify their return value are often much easier to understand and validate and,  too much programming still makes unstructured modifications to state – analogous to the old spaghetti code with unstructured control. New methods for clarifying what state can be changed by code sections and for facilitating modular decomposition of state would be useful.  Fetishizing an axiomatic approach to function definition as “mathematical” seems less so.


* I hope.




Von Neumann’s critique of automata theory and logic in computer science

These remarks from “THE GENERAL AND LOGICAL THEORY OF AUTOMATA” 1947 are, not surprisingly, enormously insightful. Von Neumann essentially predicts the emergence of the field of analysis of algorithms and algorithmic complexity.  Not only does he point out the importance of understanding how many steps it will take to carry out a computation, but the use of asymptotic measures and other tools from analysis and  working mathematics proved to be as important as von Neumann suggested.  The cautions about formal logics are important still, because too much of  theoretical CS is still fixated on foundational methods and formal logic, despite the “refractory” nature of of the methods.


There exists today a very elaborate system of formal logic, and, specifically, of logic as applied to mathematics. This is a discipline with many good sides, but also with certain serious weaknesses. This is not the occasion to enlarge upon the good sides, which I have certainly no intention to belittle. About the inadequacies, however, this may be said: Everybody who has worked in formal logic will confirm that it is one of the technically most refractory parts of mathematics. The reason for this is that it deals with rigid, all-or-none concepts, and has very little contact with the continuous concept of the real or of the complex number, that is, with mathematical analysis. Yet analysis is the technically most successful and best-elaborated part of mathematics. Thus formal logic is, by the nature of its approach, cut off from the best cultivated portions of mathematics, and forced onto the most difficult part of the mathematical terrain, into combinatorics. The theory of automata, of the digital, all-or-none type, as discussed up to now, is certainly a chapter in formal logic. It would, therefore, seem that it will have to share this unattractive property of formal logic. It will have to be, from the mathematical point of view, combinatorial rather than analytical.

Probable Characteristics of Such a Theory. Now it seems to me that this will in fact not be the case. In studying the functioning of automata, it is clearly necessary to pay attention to a circumstance which has never before made its appearance in formal logic. Throughout all modern logic, the only thing that is important is whether a result can be achieved in a finite number of elementary steps or not. The size of the number of steps which are required, on the other hand, is hardly ever a concern of formal logic. Any finite sequence of correct steps is, as a matter of principle, as good as any other. It is a matter of no consequence whether the number is small or large, or even so large that it couldn’t possibly be carried out in a lifetime, or in the presumptive lifetime of the stellar universe as we know it. In dealing with automata, this statement must he significantly modified. In the case of an automaton the thing which matters is not only whether it can reach a certain result in a finite number of steps at all but also how many such steps are needed. There are two reasons. First, automata are constructed in order to reach certain results in certain pre-assigned durations, or at least in pre-assigned orders of magnitude of duration. Second, the componentry employed has in every individual operation a small but nevertheless non-zero probability of failing. In a sufficiently long chain of operations the cumulative effect of these individual probabilities of failure may (if unchecked) reach the order of magnitude of unity-at whichpoint it produces, in effect, complete unreliability. The probability levels which are involved here are very low, but still not too far removed from the domain of ordinary technological experience. It is not difficult to estimate that a high-speed computing machine, dealing with a typical problem, may have to perform as much as 10′ 2 individual operations. The probability of error on an individual operation which can be tolerated must, therefore, be small compared to 10-12. I might mention that an electromechanical relay ( a telephone relay) is at present considered acceptable if its probability of failure on an individual operation is of the order 10-s. It is considered excellent if this order of probability is 10^-9. Thus the reliabilities required in a high-speed computing machine are higher, but not prohibitively higher, than those that constitute sound practice in certain existing industrial fields. The actually obtainable reliabilities are, however, not likely to leave a very wide margin against the minimum requirements just mentioned. An exhaustive study and a nontrivial theory will, therefore, certainly be called for. Thus the logic of automata will differ from the present system of formal logic in two relevant respects. 1. The actual length of “chains of reasoning,” that is, of the chains of operations, will have to be considered. 2. The operations of logic (syllogisms, conjunctions, disjunctions, negations, etc., that is, in the terminology that is customary for automata, various forms of gating, coincidence, anti-coincidence, blocking, etc., actions) will all have to be treated by procedures which allow exceptions ( malfunctions ) with low but non-zero probabilities. All of this will lead to theories which are much less rigidly of an all-or-none nature than past and present formal logic. They will be of a much less combinatorial, and much more analytical, character.

Spanner in the works

The developers of Google’s Spanner database built an interesting system but chose to build their own clock synchronization technology instead of getting a more precise off the shelf solution and chose to make extensive use of the notoriously complex Paxos distributed consensus algorithm even though they were incorporating clock sync into their system. Whatever the rationale at Google, building a clock sync enabled high speed distributed transaction system is much less complex if you don’t make those choices.

Off the shelf technology that can synchronize clocks to under a microsecond in high end systems and can even produce tight clock sync in the decidedly not optimal situation of cloud virtual machines is readily available (www.fsmlabs.com). And with clock sync, it is possible to enormously simplify distributed consensus. In fact, it’s hard to understand why Paxos is so widely used. Paxos is a highly complex (ingeniously complex) and painful to implement attempt to solve a problem that doesn’t exist.

The humorously titled “Paxos made simple” paper starts off discussing the problem of distributed consensus in completely asynchronous networks where timeouts won’t work and clocks are not synchronized at all. In practice, computer networks are not totally asynchronous, they do permit precise clock synchronization (and that technology has gotten much better since Paxos was introduced in 1998), and Paxos doesn’t even work in a totally asynchronous network. Page 7 of “Paxos made simple” points out that the algorithm can fail to converge on a solution, it is not “live”. Lamport, the author, then suggests electing a single coordinator or primary agent – using timeouts! Once you admit that timeouts work, you can replace Paxos with far simpler algorithms, some well known in database systems since the 1970s. Precise clock sync permits much more radical simplifications.

See also

Paxos  and Distributed Consensus  and ABCD.

Comrades Cheyne and Stokes and mathematics

There is a great book of stories by Soviet era radio astronomer Iosef Shklovsky called 5 Billion Vodka Bottles to the Moon which has a this story (which I tell from memory, no guarantees)

In 1953 a young Jewish math student found himself in jail. This was during the “Doctors Plot” purge where it was announced that Soros  the Jewish doctors attending Stalin had plotted to send him off to his reward. So the usual suspects were rounded up including the student in question. He was listening to the radio with a cellmate, a Lithuanian doctor, and they heard the news that Comrade Stalin’s illness had worsened and he was, in fact, exhibiting Cheyne/Stokes breathing. “Who are these Comrades Cheyne and Stokes? ” asks the math student. “They are very serious comrades”, replies the doctor, “Let’s have a drink to celebrate”. Shortly thereafter, the Friend of All Poets, as Stalin called himself sometimes, was buried and the Doctors Plot roundup ended and many people were freed from jail. The student went on to get a Ph.D. in mathematics which he dedicated to “Doctors Cheyne and Stokes, without their assistance, this dissertation could never have been completed.”