OpenBSD developer notes king’s clothing is “virtual”

Theo de Raadt explains why virtualization does not improve security. How about this: to improve security, you have to have a secure design, a marketing buzzword won’t do the trick. Anyone who has seriously looked that the current generation x86 virtualization hardware knows that it does not provide clean separation or levels of security. It is possible, with great care, to use it in a way that improves some types of security, but mostly it seems like a way of justifying the use of SUV monster servers.

If you look at Xen code, for example, you can see that it copies huge chunks of Linux. The argument seems to be that if Linux does not have security and fails to efficiently allocate resources, Linux+ModifiedLinux will do a better job, somehow. There’s no reason to believe this will happen and many reasons to believe that it will, in fact, make security more elusive. That’s not to either argue against virtualization as a useful technique: our RTMS uses a type of virtualization and we have a lot of customers who love VMWare, but the hype-ervisor is disconnected from reality. The delusion that making arbitrary divisions in software can reduce complexity is a persistent one, but it has no basis.

Advertisements

OpenBSD developer notes king’s clothing is “virtual”

Theo de Raadt explains why virtualization does not improve security. How about this: to improve security, you have to have a secure design, a marketing buzzword won’t do the trick. Anyone who has seriously looked that the current generation x86 virtualization hardware knows that it does not provide clean separation or levels of security. It is possible, with great care, to use it in a way that improves some types of security, but mostly it seems like a way of justifying the use of SUV monster servers.

If you look at Xen code, for example, you can see that it copies huge chunks of Linux. The argument seems to be that if Linux does not have security and fails to efficiently allocate resources, Linux+ModifiedLinux will do a better job, somehow. There’s no reason to believe this will happen and many reasons to believe that it will, in fact, make security more elusive. That’s not to either argue against virtualization as a useful technique: our RTMS uses a type of virtualization and we have a lot of customers who love VMWare, but the hype-ervisor is disconnected from reality. The delusion that making arbitrary divisions in software can reduce complexity is a persistent one, but it has no basis.

The Linux community

According to Greg’s email, organizations that contributed more than 100
changesets to the recently released 2.6.23 kernel included: Red Hat
with 827 changesets (11.7%), IBM with 557 changesets (7.9%), the Linux
Foundation with 528 changesets (7.5%), Novell with 449 changesets
(6.3%), Intel with 242 changesets (3.4%), Oracle with 158 changesets
(2.2%), MIPS Technologies with 143 changesets (2%), Nokia with 133
changesets (1.9%), and NetApp with 119 changesets (1.7%). Greg noted
that there were a total of 7,075 changesets from 992 developers working
for 126 different employers. 843 of the changesets (11.9%) were
contributed by individuals reporting no sponsor for their work.From http://kerneltrap.org/Linux/Who_Sponsors_Kernel_Development

I think many companies coming into Linux don’t understand how smart IBM, RedHat, Oracle, and others have been in driving the project. After all this time, there are still many execs who think that some magical community of elves is just waiting for the right pitch to develop code in exchange for maybe some glass beads and pats on the head.

Distributed shared memory from first principles

[Update 10/16]

What is the fundamental performance limiting factor that has dominated that last 30 years of computer architecture? The obvious answer is the disparity between processor and memory/storage speed. We connect processors to cache, to more cache, to even more cache, to some bus, to memory, to disk/network, suffering costs at each step as we try to keep the damn processor busy. My guess is that memory locality is totally different now than it was back when virtual memory systems were first designed – much less pronounced. My short review of the literature shows little research on, for example, memory locality of databases or operating systems since the 1980s. (Is that wrong?) But we do know enough to understand why the entire concept of distributed shared memory is wrong although there needs to be a catchy name for this category of error. The error is to see that some feature of computer hardware makes some type of software engineering difficult and then provide a software mechanism that makes that type of engineering impossible. The complexity of the memory hierarchy and the complexity of making locality work pose enormous challenges for software designers. The purpose of distributed shared memory is to make it impossible for software designers to control the cache performance of their programs by hiding caching behavior under a non-deterministic opaque layer.

(see comment by John Regehr)

more on missed wakeup

Here are some conventions [Update: typos fix, Friday]

  • We are concerned with state machines and sequences of events. The prefixes of a sequence include the empty sequence “null” and the sequence itself.
  • Relative state: If “w” is the sequence of events that have driven a system of state machines from their initial to their current state then
    • |M is the output of state machine “M” in the current state (reached by following input sequence “w” from the initial state of M)
    • a|M is the output of M in the state reached from the current state by an “a” event.
    • z|M is the output of M in the state reached from the current state by the sequence of events “z” (if it is not clear from the context whether “z” is an event or a sequence, then we can just say which it is.)
  • Absolute state where the sequence parameter is made explicit. This is mostly useful fordefining the initial output and for when we construct composite state machines.
    • w:M is the output of state machine “M” in the state reached by following input sequence “w” from the initial state of M.
    • wa:M is the output of M in the state reached by following the sequence wa (obtained by appending event a to sequence w) from the initial state of M.
    • wz:M is the output of M n the state reached by following the sequence wz (obtained by appending event sequence z to sequence w) from the initial state of M (if it is not clear from the context whether “z” is an event or a sequence, then we can just say which it is.)
    • null:M is the output of M in its initial state.

Specific state machines

  • |Time is the current real-time.
  • |Running(p) =1 if p is executing and 0 otherwise
  • |Runnable(p) =1 if p wants to be scheduled, 0 otherwise.
  • |RequestWakeup(p,e)=1 if p is requesting a wakeup on event e, 0 otherwise
  • If |Runnable(p)=0 then a|Running(p) ≤ |Running(p) (if p is not runnable it can stop running, but not start running.)
  • If a|Runnable(p) > |Runnable(p) then there is at least one e so that a|RequestWakeup(p,e)< |RequestWakeup(p,e) (a process can only become runnable if some wakeup event happens and is cleared.)
  • If a|RequestWakeup(p,e) < |RequestWakeup(p,e) then a|Running(p)=1 (a request is only cleared if the process is actually made runnable)
  • If |RequestWake(p,e) then there is a t so that whenever z|Time ≥ |Time +t then Sum(prefixes u of z; u|Runable(p)) > 0 (if a process has requested a wakeup, it will become runnable in some bounded time – This liveness criteria is not strictly necessary)
  • Define |Safe(p) by null:Safe(p)=1 and
    • a|Safe(p)=

      • 0 if a|Running(p)=0 and a|Runnable(p)=0 and there is no e so that a|RequestWakeup(p,e) (real trouble requires that we are not runnable so can never start running unless we become runnable and cannot become runnable because we have no requests outstanding).
      • |Safe(p) otherwise (|Safe(p) does not change otherwise)
  • So what we want is to ensure that |Safe(p)=1. If a process requests a wakeup, then deschedules, and then gets a wakeup, everything is fine. The ordering is critical. If the wakeup happens before the process deschedules, but it is committed to descheduling then the process gets in trouble. If the code looks like Request e; sleep; then a wakeup event in-between the two operations is trouble. Note that Request e; if requeststill pending then sleep; is not safe unless the test and the sleep are atomic in some sense. Define |RequestOrder(p) by null:RequestOrder(p)=idle and
    • a|RequestOrder(p) =
      • requesting if |RequestOrder(p) = idle and a|RequestWakeup(p,e)>|RequestWakeup(p,e) for some e.
      • idle if a|Runnable(p) < |Runnable(p) and |RequestOrder(p)=requesting
      • error if a|Runnable(p) < |Runnable(p) and |RequestOrder(p) != requesting
      • |RequestOrder(p) otherwise.
  • |RequestOrder(p) != error does not ensure |Safe(p).
  • One solution is a lock. Suppose (Property L) If |locked(p)>0 then a|Running(p) = Running(p) and a|RequestWakeup(p,e) >= |RequestWakeup(p,e). That is, the lock prevents a running process from not running and prevents events from clearing requests -but we can make a process not-runnable while it holds the lock. Redefine RequestOrder to watch the lock.
    • a|RequestOrder(p) =
    • errror

      • if a|Locked(p) <|Locked(p) and RequestOrder(p) != (Ready or Idle)
      • or if a|Runnable(p) <|Runnable(p) and RequestOrder(p) != ready
    • locked otherwise if |RequestOrder(p) = idle and a|Locked(p) > |Locked(p) .
    • requesting otherwise if |RequestOrder(p)=locked and a|RequestWakeup(p,e)>|RequestWakeup(p,e) for some e
    • ready otherwise if a|Runnable(p) < |Runnable(p) and |RequestOrder(p)=requesting
    • idle otherwise if a|Locked(p) < |Locked(p) and |RequestOrder(p)=ready
    • |RequestOrder(p) otherwise.
  • Given Property L, |RequestOrder(p) != errror does assure |Safe(p).

Data structures and algebra

galois[Edited 9/10] There’s an easy connection between data structures and basic abstract algebra that may or may not mean anything, but keeps getting rediscovered. I’ve never seen it clearly explained (anyone with a reference or who notices an error in my rusty algebra- please send me a note! A discrete note would be nice -) ).

W hat keeps on getting rediscovered is that objects with invertible and commutative methods would make things very convenient especially for transactions and for concurrent access. What I have not seen explained is that this discovery reflects the distinction between groups and monoids.

A monoid is one of the simplest algebraic structures and is usually given as a set M of elements and and a binary operation “op” on elements of M with the following properties:

  • Closure: for every a and b in M, (a op b)= c is also an element of M.
  • Associative: (a op (b op c)) = ((a op b) op c) for every a, b, and c in M
  • Identity: M contains an element I so that for every a in M, I op a = a op I =a.

That’s it. A pair <M,op> and three conditions: closure, associativity and identity. The natural numbers 0,1,… form a monoid with ordinary integer addition as the operation and 0 as the identity. The positive numbers form a monoid with 1 as the identity and multiplication as the operation. More intriguingly for our purposes, the methods of an abstract data type or object generate a monoid.

A monoid <M, op> is “generated” from a set X if and only if X is a subset of M and every element of M can be constructed from elements of X using the operation. For example, the Monoid of non-negative integers with addition as the operation can be generated from {0,1} since 2= 1+1, 3 = (1+1)+1 and so on. One really simple type of monoid is the set of all finite strings that can be generated from an alphabet with string concatenation as the operator.

We can use the methods of an object as the generators of a monoid as follows. Let’s define an object by a set V of possible values and a set F of methods that can act on the object. A method is really just a function from V to V so that if the object has value v and we apply method f the object value becomes f(v). The binary associative operation of “function composition” can be used to construct “programs” from methods: (f comp (g comp h))(v) = f(g(h(v))). The monoid generated by the methods, then, is the set of programs that can be constructed using function composition on the methods. All we have to do further is insist that there is a null method Null(v)=v in F.

For example, if our data structure is a stack and our methods are push_v for every v in some set of values V, plus pop, add (pop the top two values, add and push the result), sub, dup, swap, and null, we can build a Forth Monoid.

If a is an element of a monoid and there is a second element b so that a op b = I then b is called the “inverse” of a. If there is an inverse for every element of the monoid, then the monoid is called a “group”. (Note that in a group if b is the inverse of a, then a is the inverse of b because of associativity: Since a op b=I let c be the inverse of b so that b op c=I then (a op b)op c = I op c = c, but a op (b op c) = a op I = a and since op is associative, a=c.) If the group operation commutes, the group is called Abelian or commutative.

Most monoids are not groups. For example, our Forth Monoid has an inverse for “dup” (“pop” ) but pop does not have an inverse for a very interesting reason: pop loses information. Add has no inverse for the same reason: if the stack is (100,200) and an add method produces (300), then (300) there is no operation that will deduce magically that 100 and 200 were on the stack before! Monoids that are not groups lose information when they compute programs. If an object is associated with a group, then no matter what the starting value of the object, if we know the program that was applied to the object, we can produce (in theory) an anti-program that returns us to the initial state. An example of an object that generates a group is an incrementer with the single method “add1” with values over the set 0 – (232 -1) The inverse of add1 is sub1 which can be generated by 232 add1’s (did the words “efficient” or “practical” appear anywhere above?).

It has been noticed more than once that data structures whose monoids are groups would be easier to parallelize or to use for transactions. For example, if programs can be inverted: so that for every program p there is a -p with -pp = I then we can unroll back to earlier states. That is, if we apply f,g,h to the data structure, and then realize that the transaction cannot be completed, we know that –h(-g(-f(f(g(h(x)))))) = x! We do not have to know what “x” was to undo the transactions. If the programs “commute”, life is even easier, since undoing a single method application can be done at any time, without undoing other method applications. With a little thought you can see that a commutative group data structure is convenient for parallel computation because the order of operations doesn’t matter (less synchronization required) and because every operation is reversible. So if Task T starts modifying the data structure and more important task K interrupts, T’s partially completed changes to the data structure can be rolled back and K can do what it needs.

I’m not completely convinced that stumbling into group theory tells us anything useful about data structures except that computer science is hard and is mostly concerned with horribly irregular structures that do not possess convenient properties. But groups and monoids keep popping up in computer science, even though theoretical CS people seem to not be interested anymore. Famously, state machines determine monoids (or semi-groups) and there is a body of mathematics called Krohn-Rhodes theory which is concerned with factoring monoids into cascades of groups and residual monoids that correspond with splitting state machines into pipelines that have no feedback. Also there is a nice relationship between logging and group structure: Most data structures cannot be practically converted into group like structures even though every monoid can be made into a group by adding a log.

Suppose we make our Forth object have values (data-stack,log-stack) where the log-stack is essentially a stack of data removed from the stack so that “pop” removes the top element of the stack and pushes it onto the log stack – then “invert-pop” can pop the log-stack and push the value onto the data stack. I wonder if “bounded invertibility” would be an interesting property. The interesting Forth objects have a bounded stack anyways. Suppose we add a smaller log-stack so we can invert some programs: suppose the methods are invertible and programs constructed by composing at most K methods are invertible.

Anyways, there is something worth digging out in all this: think of filesystems designed to be K-deep group-like.

(I was inspired or provoked to produce this note after attending a talk by Maurice Herlihy at UT today. I can’t say I understood much of Professor Herlihy’s talk (my fault, not his) but he was discussing the utility of converting data structure monoids to groups without mentioning either groups or monoids. I also found out that distributed shared memory is still around- how depressing.)

Semi-creepy drawing of Galois from here