QuickCheck for Dynamic Test Generation

[This is part 4 of the Writing Compilers series]

Continuing with the back-to-front dynamic compiler, in this post we will:

  • Use Stack to manage the growing project
  • Write an interpreter whose result values should be the same as the dynamic compiler
  • Use QuickCheck to generate tests by comparing the interpreter to the compiler
> stack new compiler

Now we can copy the Haskell file of our budding dynamic compiler into compiler/app, and build using stack:

> cp Main.hs compiler/app
> cd compiler
> stack build

And execute it using:

> stack exec compiler-exe
 mprotect = 0
 errno = 0
 42

QuickCheck

As we build out the compiler, it would be nice to know if its working as intended, not just on a few hand-picked test cases.  Writing additional test cases is always good, but I want something stronger — I want to enforce properties on the resulting system.  Hand-selecting test cases which test these properties hides the intent, why not just check the program for these properties directly?

Proving properties about programs is hard, sometimes impossible, but we can test individual cases easily.  QuickCheck provides a middle ground: state the properties directly, then use random sampling to generate test cases to check.  Failure of the QuickCheck property guarantees our property has failed, but passing QuickCheck does not prove the property.

This technique can be used to prove the property too, but it is inefficient in all but the simplest cases: generate all possible inputs to the QuickCheck property and test that it holds for all of them.  While this does not work for complex data structures, there is a theory that most bugs occur at (relatively) small input sizes.  This leads to another tool, SmallCheck, that does exactly this.  SmallCheck generates all possible inputs up to some depth, checking the property for all of them.  QuickCheck and SmallCheck together offer strong evidence that your property is held.

Properties

Properties are statements about some set of inputs to the program, and the program’s expected behavior given these inputs.  For example, the function “reverse” that reverses a list might be expected to hold the following properties:

  • reverse [] = []
  • reverse (a:as) = (reverse as) : a
  • reverse (reverse a) = a

These look remarkably like unit tests, but where unit tests would have example values, properties have variables.  QuickCheck allows any arbitrary predicate as a property, so you are not limited to simple equalities like these.

Here it is in Haskell, for one of the properties:

prop_reverse_reverse :: [Int] -> Bool
prop_reverse_reverse a = reverse (reverse a) == a

main = quickCheck prop_reverse_reverse

I’ve restricted the property to operate over lists-of-integers for this simple example.  Running this program returns:

> ./qc-tst
 +++ OK, passed 100 tests.

QuickCheck has generated 100 lists of various lengths containing random integers and tested the property against it, none of which failed.

Generating Programs

Let’s get QuickCheck running in our compiler, generating random programs!  By teaching QuickCheck about our abstract syntax, it will automatically explore the space of valid program grammars

Here I’m adding QuickCheck to the list of build-depends in Stack:

> nano compiler.cabal
...
  build-depends:       base
                     , compiler
                     , QuickCheck
                     , mtl
                     , binary
                     , bytestring
                     , vector
                     , unix
                     , transformers
...

In order for QuickCheck to test program properties, it will first need to know how to build a program.  Recall our trivial expression datatype Expr from last time:

data Expr
  = Fixnum Word32
    deriving (Show)

QuickCheck uses the type class ‘Arbitrary’ to generate an arbitrary instance of some datatype.  The simplest implementation for our Expr type would be:

instance Arbitrary Expr where
    arbitrary = liftM Fixnum arbitrary

Here, we are defining the function named ‘arbitrary’ for the type class Arbitrary which generates an Expr containing a Fixnum constructor over an arbitrary Word32 value.  Basically, we’re just picking a random 32-bit fixnum and returning it.

While this implementation will work fine, there are so many different 32-bit numbers that we may never really exercise the important parts of the data space.  It would be nice, for instance, to ensure we hit all the byte-level crossover points, such as the largest 7-bit number (127) and its immediate successor which requires 8-bits (128).  Here’s a list of important test cases:

test_fixnum =
    [ (Fixnum 0)
    , (Fixnum 1)
    , (Fixnum 127)          -- 7-bit transition cases
    , (Fixnum 128)
    , (Fixnum 255)          -- 8-bit
    , (Fixnum 256)
    , (Fixnum 65535)        -- 16-bit
    , (Fixnum 65536)
    , (Fixnum 16777215)     -- 24-bit
    , (Fixnum 16777216)
    , (Fixnum 0x7FFFFFFF)   -- 31-bit
    , (Fixnum 0x80000000)   -- 32-bit (goes negative)
    , (Fixnum 0xFFFFFFFF)   -- 32-bit
    ]

But we don’t want to ONLY test these few cases — there may be something lurking in our code that trips us up!  So let’s rewrite the Arbitrary type class to include our test cases and the fully-random version we wrote previously:

instance Arbitrary Expr where
   arbitrary = frequency 
    [ (20, elements test_fixnum)
    , (40, liftM Fixnum arbitrary)
    ]

This says we want some of the QuickCheck tests to select a random test case from our list of specific test cases, while most of the time we want completely random Exprs.

As the Expr datatype becomes more complex, these generators will also need to evolve so we can focus on the most important aspects of our datatype, while retaining some truly random inputs to keep us honest.

Program Properties

Now that we can generate a random program (though they are all trivial at this stage), it is time to examine what properties we want to ensure from our compiler.  A compiler is supposed to transform a text representation of the program into a machine-executable version that “does the same thing” as the composition of semantic elements described by the language and composed by the programmer.

So, one property we might want to check is “does this program do what the spec says it should?”  There is a bit of a chicken-and-egg situation here, since the spec itself is also just text and is no easier to check for correctness than the program.

We can get around this by implementing the language spec as an interpreter.  Each operation of the language is written out as operations against a machine model.  A program in the language can then be passed to the interpreter for execution, the result of which should be the same as if we had executed the program natively on the hardware.

We’ll start by choosing the type of the interpreter:

evaluate :: Expr -> Value

Our interpreter will be a function from our expression type Expr to a type of values.  Since we only have one type of value right now, the next step is easy:

data Value
  = VWord32 Word32
    deriving (Show)

The ‘evaluate’ function will have a case for each branch of Expr, performing the operations of the Expr on the internal machine model.  For now there is no machine model, so the ‘evaluate’ function itself is trivial too:

evaluate :: Expr -> Value
evaluate (Fixnum x) = VWord32 x

This says:  to evaluate an expression “(Fixnum x)”, we return the constant 32-bit value ‘x’.

Compiler vs. Interpreter

Back to the QuickCheck properties!  We now have two different implementations which should both compute the same answer; the compiler and the interpreter.  This makes writing a QuickCheck property relatively simple.  QuickCheck will generate an arbitrary program for us, then we will pass the program to both the compiler and interpreter for execution, collect the two results and compare them for any differences.  We would like to ensure they are always exactly the same.

prop_exec_eval_equiv mem expr = monadicIO $ do
  let VWord32 b = evaluate expr
  a <- run $ compileAndExecute mem expr
  assert $ (toInteger a) == (toInteger b)

This function does exactly as we have described: it receives an arbitrary Expr, calls the interpreter on it, receiving a result value ‘b’, then calls the dynamic compiler to compile and execute the same program natively, collecting the result ‘a’ from it.

The final line tells QuickCheck that we want these values to be equal.

Note that this property is specialized to handle return values which are 32-bit integers.  This will need to be generalized as we add more advanced data types to the language.

The main program simply allocates a buffer for dynamic code, and calls QuickCheck to fuzz our compiler.

main = do
       mem <- allocAligned codeBufferSize codeBufferSize
       _ <- quickCheck (prop_exec_eval_equiv mem)
       Prelude.putStrLn "Done!"

Building & running the project:

> stack build
...
> stack exec compiler-exe
+++ OK, passed 100 tests.
Done!

Next Steps

We have completed the first step of the compiler as described in “An Incremental Approach to Compiler Construction”, which we began in part 2 of this series.

Next time (part 5) we will add more immediate value types such as boolean and character, requiring modification of our infrastructure and QuickCheck properties.  In particular we need to choose a representation for tagging or ‘boxing’ our values to differentiate an integer from a boolean.

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s