monads and the delusional basis of "functional programming"

I’m struggling to try to understand something more about functional programming but so far have seen nothing to persuade me that programs should be considered to be mathematical “functions”. We don’t have mathematical objects like integers and reals and functions in programming, we have bit patterns that represent  mathematical objects and other kinds of objects and we have algorithms that implement functions, usually they only approximate those functions. Pretending otherwise creates complexity for no reason.  Programs are mechanisms for producing or modifying bit patterns.  It’s obviously useful to be able to treat, for example, a variable of unsigned integer type as if it were an integer, but it’s important to remember that it’s a bit pattern representing an integer, not an actual integer.

It’s also worth noting that the notion of types in working mathematics is much more fluid and “informal” than the type systems that theoretical computer scientists prefer.  There is no reason to believe that even if types can be organized in some sort of hierarchy, doing so produces any illumination at all.

Russell introduced what is often viewed as his most original contribution to mathematical logic: his theory of types—which in essence tries to distinguish between sets, sets of sets, etc. by considering them to be of different “types”, and then restricts how they can be combined. I must say that I consider types to be something of a hack. And indeed I have always felt that the related idea of “data types” has very much served to hold up the long-term development of  programming languages. (Mathematica, for example, gets great flexibility precisely from avoiding the use of types—even if internally it does use something like them to achieve various practical efficiencies.) – Wolfram.

Understanding Paxos and Distributed Consensus

vase

(minor wording correction and more complaining added 10/2/2016,
 minor edits 10/5/2016)

Multi-proposer Paxos is a very clever and notoriously slippery algorithm for obtaining distributed consensus. In this note I try to explain it clearly and provide a correctness proof that gives some intuition why it works – to the extent that it does work. I am specifically concerned with Multi-Proposer Paxos, the first algorithm discussed in “Paxos Made Simple”.  What is often called “Paxos” involves single Proposer variants which are much simpler and less interesting.

I think this is right – let me know if you spot an error.

Rules for how Paxos works

There is a finite set of processes or network sites, some of which are Proposers and some Acceptors (the sets can intersect). Each proposer has a unique id, confusingly called a sequence number. A proposal is a pair consisting of a proposal value and the sequence number of the Proposer. The goal is to ensure that if two Proposers believe that they have convinced the Acceptors to come to a consensus on a value, they must both agree on the same value, even though they may disagree on sequence number. The most clever part of Paxos is the observation that since we don’t care which value wins, even though we do care that some unique value wins, we can force Proposers to inherit values of the most likely previous proposal.

  1. Proposers can ask Acceptors to approve sequence numbers and to accept proposals which include a value and the Proposer’s sequence number. Acceptors do not have to approve or accept but are limited to approving and accepting what Proposers send them.
  2. When an Acceptor approves a sequence number it:
    1. Promises to not approve any smaller sequence numbers
    2. Promises to not accept any proposals with smaller sequence numbers
    3. Returns to the Proposer the proposal with the highest sequence number it has already accepted, if any.
  3. The Proposer cannot send any proposals or select a value for a proposal until it gets approval for its sequence number from a majority of Acceptors.
  4. Once the Proposer has approval from a majority of Acceptors it must select the value of the proposal with the highest sequence number sent to it during the approval phase (the inherited proposal). If the approval phase did not turn up any accepted proposals, the Proposer can pick any value. In this case the Proposer “originates” the value.
  5. Once the value is selected, the Proposer can never change the value and can only propose the pair of that value and its sequence number – unless it increases its sequence number, abandons the proposal and value, and starts over.
  6. The choice of a new sequence number must preserve the property that each sequence number belongs to only one Proposer, ever.

(see the  code for what this looks like in a simulation)

Why it works intuition

The first thing to see is that individual Acceptors are forced to order their approvals and accepts by sequence number. If an Acceptor has approved j and accepted (i,v) and j>i then we know that it must have approved j after it accepted (i,v). The sequence of operations for that Acceptor must act like this:

Continue reading “Understanding Paxos and Distributed Consensus”

circularity problems in distributed consensus

Distributed consensus involves organizing a collection of independent agents – processes or network sites – to agree on some value or sequence of values.  Many distributed consensus methods depend on a leader-follower scheme in which the leader is an agent that essentially tells the followers what the values are. The challenges in such methods are to determine when enough of the followers have accepted the value and how to recover from failures of agents. In particular, failures of the leader trigger some procedure to select a new leader.  Leader election, however, is a distributed consensus problem. In fact, leader election is the harder problem. Once there is a leader, consensus in the followers can be produced by a dead simple protocol (see the second part of this ).  Oddly, leader election is generally treated as a minor issue. For example, in “Paxos made simple” we read:

The famous result of Fischer, Lynch, and Patterson [1] implies that a reliable algorithm for electing a proposer must use either randomness or real time—for example, by using timeouts. However, safety is ensured regardless of the success or failure of the election.

The FLP result is essentially a tautology: if an agent doesn’t ever get any information that reliably distinguishes between failure and slow response in a second agent, the first agent cannot reliably distinguish between failure of the second agent and slow response.  So the import of the first sentence is that leader election depends on timeouts or “randomness” (perhaps this means some analysis of probability of failure scenarios).  I don’t think this is correct, but it’s an interesting claim. The second sentence says nothing more than that an algorithm that fails to progress will never produce a false result – which I think is also a dubious claim.

Algorithm P solves problem X by assuming some other mechanism solves X and then by using that mechanism to make problem X simpler.  Ok.

 

state equations in practice

 

When people think of mathematical models of state for programs and other computer systems, it’s natural and conventional to consider state as a map from symbolic names of state variable to values. This is an error for a number of reasons including, most critically, the lack of any compositional structure. Let’s step back and think about what a discrete state system looks like.

Notation warning: I’m using _ for subscript x_p or f_r or one_(two).

  1. We have a set E of discrete “events” called the event alphabet, The event alphabet defines all the possible events and inputs that can cause state to change.
  2. A finite sequence of events over E can be thought of as a system history, describing what series of events drove the system from its initial state (which is the point at which no events at all had happened.)  You could complain at this point that deterministic, single thread of execution (the sequence) models can’t describe non-determinism, parallelism, concurrency, and abstract properties, but you’d be wrong – keep reading.
  3. Now we are going to take an odd step – I don’t want to start with a state machine or other object that represents the whole system. I want to start with state variables that express some system property and how it changes in response to events. For example, a state variable Executing might tell us the process identifier of the currently running (executing) process or it could be set valued in a multi-processor environment. And priority_ might be the priority of the process with identifier, p.  As the system advances state, the values of the state variable change but often we want to have properties that are true in any state. For example,  (Executing=p and q in Waiting and priority_p>  priority_qimplies TimeSinceSwitch < t)  would be a good property for some operating system that needs to schedule by priority but also let lower priority processes advance.
  4. State variables are dependent variables that depend on the event sequence and some map that extracts  information from the event sequence. The event sequence is the free variable and the state variable is a dependent variable that embodies some aspect of system state. Everything that determines current state is in the event sequence but state is not a completed thing, not an discrete object, it is the cumulative information of multiple state variables depending on that event sequence. The same event sequence may correspond to completely different collections of state variables for different systems.
  5.  We might have many state variables to specify how an operating system works or even how a simple gate works. Let’s think of all of them at the same level of abstraction as depending on the same event sequence (finite, of course).
  6. The key to this approach is to be able to compactly define state variables and abstract properties of state variables even though the event alphabets and maps on sequences of events will often be enormously complicated.
  7. Say a variable y is a state variable for an event alphabet E if  y=f(σ) for some function f where σ is a free variable in the set of finite sequences over E which includes the empty sequence Nil.  The map f is a solution for y  that extracts values from the sequence as state advances (as events are appended to the event sequence). Essentially y depends on two parameters: the sequence  and the map. These parameters are often implicit.
  8. By the definition of state variable, if is a state variable, then  y=f(σ)  for some f so y(z) = f(z)  by the usual convention.
  9. If y is a state variable then y(Nil) is the value of  in the initial state since  y(Nil) = f(Nil).
  10. The convention here is that σ is always a free variable over E* (the set of finite sequences over alphabet E).
  11. If is a sequence and e is an event or a variable over events, write Append(z,e) or just ze for the sequence obtained by appending e to z on the right.  So y(σe) is the value of y in the “next state” – the state after e – since y(σe) = f(σe). 
  12. If  y = y(σe) then the event “e” leaves the value of unchanged.
  13. If y(σe) = y +2  then the event “e” increases the value of  by 2. The equation  y(σe)=y+2 can be rewritten as f(σe)=f(σ) + if we know f. 
  14. A primitive recursive definition on the sequence will completely define the solution  if the recursive map is defined. So  [y(Nil)= k and y(σe) = h(e, y)] defines f if h and are defined.  Note y(Nil) = f(Nil)=k. And y(σe)= h(e,y) = h(e,y(σ)) = h(e,f(σ)). Suppose sequence w=<a,b,c>, then y(w) = f(<a,b,c>) = h(c,f(<a,b>)) = h(c,h(b,f(<a>)) = h(c,h(b,h(a,k)))).  In many cases we don’t want to or can’t specify f completely – f is a solution to the constraints on y and there may be many or infinitely many or no solutions.
  15. For example, suppose C is a state variable and C(Nil)=0 and C(σe) = 1+C. Then we have defined C to be the length of the event sequence.
  16. Define L(σ e) = e so L is the most recent event. Note we do not need to define L(Nil).
  17. Define [ SinceReset(Nil) = 0 and SinceReset(σ e) = ( 0 if e=RESET and  1+ SinceReset otherwise).
  18. Suppose we have a state variable Clock that somehow determines time passed since initial state from the event sequence. Then (Clock(σ e)- Clock) is the duration of event e in this state (and may be different for different values of  σ ). Define  waiting_p(Nil)=0 and waiting_p(σ e)=0 if Executing(σ e) =p and   waiting_p(σ e)= waiting_p+(Clock(σ e)- Clock)  otherwise. So  waiting_p is the time that process has been waiting to become the executing process. We might want a property (waiting_p > k only if priority_p < priority_p where q = Executing).
  19. Ok, now we can consider component composition – which is just function composition. Consider the state variable L defined above and suppose we have a new alphabet B consisting of Left_x and Right_x for x in some set X of data values. The goal is to construct a shift register from instances of L.  Define Li =  L(ui) where ui is also a state variable that has values that are sequences of elements of X. Usually we want components to state in the initial state so ui(Nil) = Nil.  This means Li (Nil)= L(Nil) which is something we didn’t define. Now define ui(σ e) where “e” is an element of the event alphabet of the composite system.  Remember I’m using juxtaposition to mean “append on the right” so, for example (u_iL_(i-1) ) means “append the current value of L_(i-1) to the current value of “
    1. u_i(σ e)=  u_i x if i=1 and e=Right_x or if i=n and e=Left_x
    2. u_i(σ e)=  u_i L_(i-1) if i>1 and

    e=Left_x

  20. u_i(σ e)=  u_i  L_(i+1)  if i<n and e=Right_x
  • The definition above causes all n components to change state in parallel when an event is appended to the event sequence of the composite system – but it’s just composition: L_i = L(u_i)=  L(u_i(σ)) and L_i(σ e)= L(u_i(σ e) ) =L(u_i(σ)e’) = e’ where e’ is calculated as above.
  • If  a device D that is constructed from components  C_1, … C_n  that are interconnected. D is a function of σ but the components don’t “see” the same events. For each component, we need a state variable u_i that is sequence valued and a function of  σ . Usually these are defined recursively: u_i(Nil)= Nil  and u_i(σe)=  concatenate(u_i,h_i(e,C_1(u_1)…  C_n(u_n ))) or something like that. That is, each event e for D causes event or events  h_i(e,C_1(u_1 )…  C_n(u_n ))  to be appended to the sequence of events u_i.

Operations and maps on finite sequences

A lot of what I’m trying to do with mathematical models of computer systems involves operations on finite sequences.

Define a “finite sequence of length n>0” to be any total map f: {1 … n} → X for some set X. The 0-length sequence is the null map  “nulls”. If f is a finite sequence of length n, then g = f affix c is the map g: {1 … n+1}   → X ∪ {c}  so that g(i)= f(i) for  i ≤ n and g(n+1) = c.  Also, if g is length n>0 then there is an f of length n-1  and some c so that g = f affix c .

A primitive recursive function on finite sequences is given by the rules  F(nulls) = k and  F( f affix c) = G(c,F(f)). 

For example we can define prefix by  (c prefix nulls) = (nulls affix c) and ( c prefix (f affix d)) = (c prefix f) affix d.

I use affix instead of append because prepend is ugly.

Two observations about common usage in computer science that differs from standard mathematical practice. First, note how I’m defining sequences as maps but being imprecise about the function images, about the set X. In CS we really only have one type of object – a sequence of binary digits- so X is always a set of finite sequences of binary digits. This is a fundamental property of computing. So if I wanted to be pedantic, I’d first define the set B of finite sequences of binary digits as the set of all maps g:{1 … n} → {0,1}  plus the null map, and then define general finite sequences as maps  {1 … n} → B.  But we use those binary sequences as representations of other mathematical objects and even objects that are usually not considered mathematical objects such as MPEG encodings of video.  So it’s convenient to speak of a sequence of integers or strings or a sequence of assorted objects without making the translation between representation and denotation  (connotation? ). The second  observation is that, for the same reason, second order functions are not unusual in computer science – e.g. defining prefix as a function that operates on sequences which are also functions. Note however that in non CS applied math, second order functions are also common.

Painting is The Mathematician by Diego Rivera.