QuickCheck for Dynamic Test Generation

[This is part 4 of the Writing Compilers series]

Continuing with the back-to-front dynamic compiler, in this post we will:

  • Use Stack to manage the growing project
  • Write an interpreter whose result values should be the same as the dynamic compiler
  • Use QuickCheck to generate tests by comparing the interpreter to the compiler
> stack new compiler

Now we can copy the Haskell file of our budding dynamic compiler into compiler/app, and build using stack:

> cp Main.hs compiler/app
> cd compiler
> stack build

And execute it using:

> stack exec compiler-exe
 mprotect = 0
 errno = 0
 42

QuickCheck

As we build out the compiler, it would be nice to know if its working as intended, not just on a few hand-picked test cases.  Writing additional test cases is always good, but I want something stronger — I want to enforce properties on the resulting system.  Hand-selecting test cases which test these properties hides the intent, why not just check the program for these properties directly?

Proving properties about programs is hard, sometimes impossible, but we can test individual cases easily.  QuickCheck provides a middle ground: state the properties directly, then use random sampling to generate test cases to check.  Failure of the QuickCheck property guarantees our property has failed, but passing QuickCheck does not prove the property.

This technique can be used to prove the property too, but it is inefficient in all but the simplest cases: generate all possible inputs to the QuickCheck property and test that it holds for all of them.  While this does not work for complex data structures, there is a theory that most bugs occur at (relatively) small input sizes.  This leads to another tool, SmallCheck, that does exactly this.  SmallCheck generates all possible inputs up to some depth, checking the property for all of them.  QuickCheck and SmallCheck together offer strong evidence that your property is held.

Properties

Properties are statements about some set of inputs to the program, and the program’s expected behavior given these inputs.  For example, the function “reverse” that reverses a list might be expected to hold the following properties:

  • reverse [] = []
  • reverse (a:as) = (reverse as) : a
  • reverse (reverse a) = a

These look remarkably like unit tests, but where unit tests would have example values, properties have variables.  QuickCheck allows any arbitrary predicate as a property, so you are not limited to simple equalities like these.

Here it is in Haskell, for one of the properties:

prop_reverse_reverse :: [Int] -> Bool
prop_reverse_reverse a = reverse (reverse a) == a

main = quickCheck prop_reverse_reverse

I’ve restricted the property to operate over lists-of-integers for this simple example.  Running this program returns:

> ./qc-tst
 +++ OK, passed 100 tests.

QuickCheck has generated 100 lists of various lengths containing random integers and tested the property against it, none of which failed.

Generating Programs

Let’s get QuickCheck running in our compiler, generating random programs!  By teaching QuickCheck about our abstract syntax, it will automatically explore the space of valid program grammars

Here I’m adding QuickCheck to the list of build-depends in Stack:

> nano compiler.cabal
...
  build-depends:       base
                     , compiler
                     , QuickCheck
                     , mtl
                     , binary
                     , bytestring
                     , vector
                     , unix
                     , transformers
...

In order for QuickCheck to test program properties, it will first need to know how to build a program.  Recall our trivial expression datatype Expr from last time:

data Expr
  = Fixnum Word32
    deriving (Show)

QuickCheck uses the type class ‘Arbitrary’ to generate an arbitrary instance of some datatype.  The simplest implementation for our Expr type would be:

instance Arbitrary Expr where
    arbitrary = liftM Fixnum arbitrary

Here, we are defining the function named ‘arbitrary’ for the type class Arbitrary which generates an Expr containing a Fixnum constructor over an arbitrary Word32 value.  Basically, we’re just picking a random 32-bit fixnum and returning it.

While this implementation will work fine, there are so many different 32-bit numbers that we may never really exercise the important parts of the data space.  It would be nice, for instance, to ensure we hit all the byte-level crossover points, such as the largest 7-bit number (127) and its immediate successor which requires 8-bits (128).  Here’s a list of important test cases:

test_fixnum =
    [ (Fixnum 0)
    , (Fixnum 1)
    , (Fixnum 127)          -- 7-bit transition cases
    , (Fixnum 128)
    , (Fixnum 255)          -- 8-bit
    , (Fixnum 256)
    , (Fixnum 65535)        -- 16-bit
    , (Fixnum 65536)
    , (Fixnum 16777215)     -- 24-bit
    , (Fixnum 16777216)
    , (Fixnum 0x7FFFFFFF)   -- 31-bit
    , (Fixnum 0x80000000)   -- 32-bit (goes negative)
    , (Fixnum 0xFFFFFFFF)   -- 32-bit
    ]

But we don’t want to ONLY test these few cases — there may be something lurking in our code that trips us up!  So let’s rewrite the Arbitrary type class to include our test cases and the fully-random version we wrote previously:

instance Arbitrary Expr where
   arbitrary = frequency 
    [ (20, elements test_fixnum)
    , (40, liftM Fixnum arbitrary)
    ]

This says we want some of the QuickCheck tests to select a random test case from our list of specific test cases, while most of the time we want completely random Exprs.

As the Expr datatype becomes more complex, these generators will also need to evolve so we can focus on the most important aspects of our datatype, while retaining some truly random inputs to keep us honest.

Program Properties

Now that we can generate a random program (though they are all trivial at this stage), it is time to examine what properties we want to ensure from our compiler.  A compiler is supposed to transform a text representation of the program into a machine-executable version that “does the same thing” as the composition of semantic elements described by the language and composed by the programmer.

So, one property we might want to check is “does this program do what the spec says it should?”  There is a bit of a chicken-and-egg situation here, since the spec itself is also just text and is no easier to check for correctness than the program.

We can get around this by implementing the language spec as an interpreter.  Each operation of the language is written out as operations against a machine model.  A program in the language can then be passed to the interpreter for execution, the result of which should be the same as if we had executed the program natively on the hardware.

We’ll start by choosing the type of the interpreter:

evaluate :: Expr -> Value

Our interpreter will be a function from our expression type Expr to a type of values.  Since we only have one type of value right now, the next step is easy:

data Value
  = VWord32 Word32
    deriving (Show)

The ‘evaluate’ function will have a case for each branch of Expr, performing the operations of the Expr on the internal machine model.  For now there is no machine model, so the ‘evaluate’ function itself is trivial too:

evaluate :: Expr -> Value
evaluate (Fixnum x) = VWord32 x

This says:  to evaluate an expression “(Fixnum x)”, we return the constant 32-bit value ‘x’.

Compiler vs. Interpreter

Back to the QuickCheck properties!  We now have two different implementations which should both compute the same answer; the compiler and the interpreter.  This makes writing a QuickCheck property relatively simple.  QuickCheck will generate an arbitrary program for us, then we will pass the program to both the compiler and interpreter for execution, collect the two results and compare them for any differences.  We would like to ensure they are always exactly the same.

prop_exec_eval_equiv mem expr = monadicIO $ do
  let VWord32 b = evaluate expr
  a <- run $ compileAndExecute mem expr
  assert $ (toInteger a) == (toInteger b)

This function does exactly as we have described: it receives an arbitrary Expr, calls the interpreter on it, receiving a result value ‘b’, then calls the dynamic compiler to compile and execute the same program natively, collecting the result ‘a’ from it.

The final line tells QuickCheck that we want these values to be equal.

Note that this property is specialized to handle return values which are 32-bit integers.  This will need to be generalized as we add more advanced data types to the language.

The main program simply allocates a buffer for dynamic code, and calls QuickCheck to fuzz our compiler.

main = do
       mem <- allocAligned codeBufferSize codeBufferSize
       _ <- quickCheck (prop_exec_eval_equiv mem)
       Prelude.putStrLn "Done!"

Building & running the project:

> stack build
...
> stack exec compiler-exe
+++ OK, passed 100 tests.
Done!

Next Steps

We have completed the first step of the compiler as described in “An Incremental Approach to Compiler Construction”, which we began in part 2 of this series.

Next time (part 5) we will add more immediate value types such as boolean and character, requiring modification of our infrastructure and QuickCheck properties.  In particular we need to choose a representation for tagging or ‘boxing’ our values to differentiate an integer from a boolean.

 

 

QuickCheck for Paxos

This is the second post in Paxos Made Complete, a series highlighting design decisions when building a system based on  Paxos and the State Machine Approach.  Code for the series is available in the github repository.  Today we will finish the Classic Paxos implementation by embedding it into a small simulator, then generating tests using QuickCheck.

Fuzzing with Event Streams

Critical infrastructure requires thorough testing, and distributed computing is an especially challenging domain to test.  The technique I have found most valuable is deterministic simulations driven by a generated event stream.  For small networks, I can simulate a complete multi-node system on a single machine and fuzz it with a stream of random events.  The deterministic nature of the simulator allows me to write the event stream to a file and replay it later for regression testing.

For Paxos, this means defining some interesting events which can occur in the real world, then writing a simulator which accepts a stream of these events and interprets them by calling our Paxos function(s).  Testing becomes the process of generating a random stream of events and running them in the simulator, checking assertions all the while.

Since the entire state of the global system is available locally, it is trivial to check global properties between each event, ensuring correctness at all times.  Another benefit to this technique is the exact event which causes the inconsistency is discovered immediately.

For this blog series, I chose a set of events which is sufficiently interesting to expose weaknesses in the protocol, but not so many as to be a burden to implement.  Here’s what I chose:

data Event
  = Deliver           -- Deliver the next outstanding message
  | Duplicate         -- Duplicate a message and add it to the network
  | Drop              -- Delete a message from the network
  | Shift             -- Rearrange message delivery order
  | Tick NodeID       -- Send Paxos a Timeout event (failure suspicion)
  | Req NodeID Value  -- Request a value to be added to the log

Deliver, Drop, and Duplicate should be self-explanatory.  Shift rearranges the message ordering to simulate packets delayed in the network.

Tick and Req may require a bit more explanation:

Tick Event

In production Paxos environments, there must be a signal to cause a Leader to send the Prepare message.  This signal is often derived from a Leader Election protocol.  The leader election is driven by timeouts and other inputs which combine to form a suspicion of node failure, causing the election process to commence.

The Tick event skips all of that – we declare that this node believes itself to be the Leader and will attempt to gain control of the voting.  This is a conservative strategy; all possible leader elections will be covered by placing the Tick event at the right spot in the event stream, as well as a variety of situations which we hope would not happen in the real world, but which Paxos should handle correctly anyway.

Req Event

In typical implementations there are two entities outside Paxos; a client application which is requesting values to be added to the log, and an under-consensus application consuming the values from the log.  For example, a web app sending database requests to the cloud, and the database applying the updates in the order specified by the Paxos log.

The Req event simulates the client’s request to add values to the log.  There are so many design decisions related to client semantics, expected behaviors, and control flow, I’ll have to leave that for a future post.  For now, we will assume that clients will resend requests if they don’t receive a response for awhile.

QuickCheck

QuickCheck is a library which generates random instances of a data type and feeds them into assertions to check a program’s properties.   If it finds a failed assertion, it will automatically trim the datatype, looking for the shortest/smallest example which triggers the assertion.  The result is an input to the assertion which causes it to fail — a counter-example to the assertion.

This makes QuickCheck particularly valuable for debugging; it will not just tell you an error has occurred, but offer the actual inputs needed to create the failure.  Haskell’s pure functional semantics has the added benefit of being deterministic, so these inputs can be stored as regression test cases without needing the often-complex setup code needed to test imperative programs.

In order to fuzz Paxos with QuickCheck, we must supply a few functions to teach QuickCheck how to walk through our data types and what (interesting) values to insert in each location.  Here’s a few examples for the types we defined last time:

instance Arbitrary NodeID where
   arbitrary = elements [NodeA,NodeB,NodeC]

instance Arbitrary Value where
   arbitrary = elements [ValueA,ValueB,ValueC]

instance Arbitrary Event where
   arbitrary = do

instance Arbitrary Event where
   arbitrary = do
      node <- arbitrary
      val  <- arbitrary
      frequency
        [ (50, elements [Deliver]) 
        , (40, elements [Drop])
        , (20, elements [Tick node])
        , (20, elements [Req node val])
        , (5, elements [Shift])
        , (5, elements [Duplicate])
        ]

Once QuickCheck is setup for our types, it’s time to define the properties we want to check.  A property is any deterministic predicate.  In Haskell, that means just about any function returning a boolean!

The Consistency Property

The main property of Paxos is its freedom from inconsistency; any correct node will reach the same decision, and decisions never change once made.  Our job is to encode this property as a function which takes a list of events and returns TRUE if Paxos held the property, or FALSE if an inconsistency was detected.

My tactic is similar to the method described in “Paxos Made Live”; execute random events against a fresh network, simulating an unstable period, then run a series of “stabilizing” events to flush out the protocol.  If the nodes disagree on the result, an inconsistency has occurred.  If we carefully select the stabilizing events such that they are the very minimum possible events to complete the protocol, we can also detect “live-lock” situations (since any remaining undecided nodes would be “out of spec”).

I’ll skip most of the infrastructure needed to get the simulator running.  Here are the important parts:

type Network = ([(From,To,Message)], Map NodeID Node)

...
handleEvent :: Network -> Event -> Network
handleEvent (msgs, net)     (Tick to)    = networkDeliver (From to) (To to) Timeout msgs net
handleEvent (msgs, net)     (Req to val) = networkDeliver (From to) (To to) (Request val) msgs net
handleEvent ([], net)       _            = ([],net)
-- Always deliver messages to ourself
handleEvent (((From from),(To to),msg):msgs, net) _ | from == to = networkDeliver (From from) (To to) msg msgs net
handleEvent ((from,to,msg):msgs, net) Deliver      = networkDeliver from to msg msgs net
handleEvent (msg:msgs, net) Drop         = (msgs, net)
handleEvent (m:ms, net)     Duplicate    = (m:m:ms, net)
handleEvent (m:ms, net)     Shift        = (ms ++ [m], net)

...
runNetwork :: Network -> [Event] -> Network
runNetwork net events = Prelude.foldl handleEvent net events

...
prop_ConsensusAchieved :: Network -> [Event] -> Bool
prop_ConsensusAchieved network events =
  let unstable    = runNetwork network $ (Tick NodeA):events
      (_,stable)  = stableEvents unstable stabilize
      allVals     = extractStableValues stable
  in Prelude.notElem NoValue (IntMap.elems allVals)

The function “prop_ConsensusAchieved” is the property we want to test.  It takes a list of Events (generated by QuickCheck), executes the events using handleEvent on a Network containing simulated nodes and a message queue.  Once the “unstable” network has run, we offer a stream of “stabilizing” events, after which we extract values and compare.  If this leaves us with an inconsistency or no agreed value, the property has failed.

Finally, we can invoke QuickCheck on this property:

quickCheckWith stdArgs { maxSuccess = 1000 } prop_ConsensusAchieved

Check out the github repository for the complete source code.

Design Decisions

  • I’ve used a technique often described in Paxos literature where all outgoing messages are also sent to our self for processing.  This allows the logic for each Role to remain separate and avoid code pollution and duplication.
    • The handleEvent function hides one important feature; messages sent to our self are always delivered.  This overrides the Drop, Duplicate, and Shift events.  As written, this still consumes an event from the queue.  A better technique would be to immediately deliver the message to our self, although this poses the risk for an infinite loop of self-delivery…
  • The selection of events is somewhat arbitrary.  Specifically Duplicate and Shift, which I wanted to be deterministic even if the intervening events were to generate a different number of output messages into the network queue.  The semantics I selected meet my criteria, but I’m not totally satisfied with them.
  • The design presented here is slightly more complex than necessary in preparation for an upcoming step which will allow us to test multiple different versions of Paxos against each other on the same event stream(s).

Next Steps

Next time we will modify Classic Paxos into Mult-Paxos, and it’ll blow your mind just how trivial the changes are!  From there we will begin examining efficiency by optimizing the structures and message formats to streamline the trivial implementation.