Stateful property-based testing with QuickCheck State Machine

Mike Solomon - Apr 17 '20 - - Dev Community

Property-based testing is a technique where you make assertions about a system's output with respect to its input. For example, if the input to a system (a function, a server, etc) is two numbers, property-based testing could assert that the output of the system should be the sum of these numbers. This type of testing frees you from having to come up with input data. Instead, you define relationships between the system's input and output. Then the test runner verifies that the relationships hold.

Stateful property-based testing (SPBT) is another technique for when the tested system retains a state. This is the case, for example, when the system is a database or a queue or a file. If I write an entry to a database and then list all entries in the database, I would expect the entry I wrote to be part of the list. That is a stateful property of the database.

There are libraries available in several different languages for SPBT. In this article, I will use quickcheck-state-machine. I like quickcheck-state-machine for many reasons:

  1. It is written in Haskell, which means you get access to Haskell's type safety and fast performance.
  2. Its opinionated structure splits SPBT into component parts, which helped my learning process.
  3. It builds a state machine, which can be manipulated outside of the test. quickcheck-state-machine's function prettyCommands uses the state machine, for example, to make really nice logs after the test is run.
  4. Fine-grained control of generation and shrinking is possible. This allows you to do more targeted testing.
  5. Its use of the higher-kinded types (HKTs) Symbolic and Concrete. It allows you to extract commands from a state machine using the Symbolic HKT and then run it using the Concrete HKT.
  6. It can test the parallel execution pf commands to find bugs arising from race conditions.

This article shows how to use quickcheck-state-machine to build a state machine and use it for SPBT. It uses version 0.7.0 of quickcheck-state-machine. The system under test will be a FIFO queue of integers that uses the file system to store entries.

As the quickcheck-state-machine library is under active development, the API is subject to change. I will do my best to revise this article as the API changes.

Model, Command, Response

The fundamental building blocks of a state machine built with quickcheck-state-machine come in three types. One represents a model of the system, one represents the commands that can be issued to the system and one represents responses to the commands.

All three need to be polymorphic in accepting an HKT with the signature (Type -> Type), which I'll call r. This polymorphism will never be used directly. But it is used by quickcheck-state-machine internally to inject two different HKTs: Symbolic and Concrete.

The Symbolic HKT is used by quickcheck-state-machine when generating a series of commands from a state machine. In contrast, the Concrete HKT is used when the state machine is executing. In models that can be created in pure contexts like the one below, this distinction is not useful. But when models use types that only exist in monadic contexts, the distinction is important.

data Model (r :: Type -> Type) = Model [Int] deriving (Show, Eq, Generic)
deriving anyclass instance ToExpr (Model Concrete)

data Command (r :: Type -> Type)
  = Push Int
  | Pop
  | AskLength
  deriving stock (Eq, Show, Generic1)
  deriving anyclass (Rank2.Functor, Rank2.Foldable, Rank2.Traversable, CommandNames)

data Response (r :: Type -> Type)
  = Pushed
  | Popped (Maybe Int)
  | TellLength Int
  deriving stock (Show, Generic1)
  deriving anyclass (Rank2.Foldable)
Enter fullscreen mode Exit fullscreen mode

Let's unpack what's going on here. The Model is an array of integers that we'll use to simulate a FIFO queue. There are three Commands - you can Push an integer onto the queue, Pop something off of the queue (either nothing or an integer), and AskLength to the queue. The Responses to these three commands are confirming that a value has been Pushed, telling us the integer that has been Popped and TellLength.

It isn't necessary to have a one-to-one correspondence between commands and responses. Haskell's pattern matching will allow us to define the function for any valid command/response pair.

Defining the queue

Here is a FIFO queue for integers that reads and writes the queue to the file system. Each integer is separated by a colon:

-- push to the head of the queue
pushToQueue :: String -> Int -> IO ()
pushToQueue fname x = do
    fe <- doesFileExist fname
    if (not fe) then do
            withFile fname WriteMode $ \handle -> hPutStr handle $ show x -- write the number
        else do
            txt <- withFile fname ReadMode $ \handle -> hGetLine handle
            let split = splitOn ":" txt
            -- append the number to the beginning of the string
            withFile fname WriteMode $ \handle -> hPutStr handle $ intercalate ":" (show x : split)

-- pop from the back of the queue
popFromQueue :: String -> IO (Maybe Int)
popFromQueue fname = do
    fe <- doesFileExist fname
    if (not fe) then return $ Nothing else do
        txt <- withFile fname ReadMode $ \handle -> hGetLine handle
        let split = splitOn ":" txt
        if (length split == 1) then
                -- remove the file if queue is empty
                removeFile fname
            else
                -- remove the last element
                withFile fname WriteMode $ \handle -> hPutStr handle $ intercalate ":" $ init split
        return $ if null split then Nothing else Just (read (last split) :: Int)

-- get the length of the queue
lengthQueue :: String -> IO Int
lengthQueue fname = do
    fe <- doesFileExist fname
    if (not fe) then return 0 else do
        txt <- withFile fname ReadMode $ \handle -> hGetLine handle
        let split = splitOn ":" txt
        return $ length split
Enter fullscreen mode Exit fullscreen mode

Initializing the model

The first step in creating my state machine is to initialize the model. The initializer function needs to be polymorphic as it will eventually accept the Symbolic and Concrete HKTs depending on if you are in generation or execution mode. In this case, as I am using an array as the underlying model, the logical initializer is an empty array.

initModel :: Model r
initModel = Model []
Enter fullscreen mode Exit fullscreen mode

Transitions

The next thing I need to do for my state machine is to create transitions. The transitions are used to both generate commands and execute the tests, so the function needs to remain polymorphic.

The transition function takes a model, a command, and a response. It then returns the underlying model after the command has been applied. You can think of the model as transitioning from one state to the next.

In the implementation below, I make my own FIFO queue with Pop and Push. Then AskLength will return the length of the model:

transition :: Model r -> Command r -> Response r -> Model r
transition (Model m) (Push x) Pushed = Model (x : m)
transition (Model m) Pop (Popped _) = Model (if null m then m else init m)
transition m AskLength (TellLength _) = m
Enter fullscreen mode Exit fullscreen mode

Preconditions

Preconditions are guards that apply to certain commands based on the current state. Top represents the precondition always being satisfied. Bot is the opposite, the precondition is never satisfied. The Logic type contains various boolean operators that can be applied to the model and command. The outcome of the operator determines if the precondition is satisfied or not.

Because the pre-condition is only used when generating lists of programs, it doesn't need to use concrete values. So it doesn't need to be polymorphic and exists only for the Symbolic HKT.

In this model, every command can be executed irrespective of the state. So I return Top:

precondition :: Model Symbolic -> Command Symbolic -> Logic
precondition _ _ = Top
Enter fullscreen mode Exit fullscreen mode

Postconditions

Postconditions are where the correctness of the response is asserted. I like this API because it provides a one-stop-shop for all assertions. In other SPBT libraries, it is easy to litter assertions all over the place, which makes the code more difficult to read. In quickcheck-state-machine, the only checks for correct behavior are in the postconditions.

Postconditions only are checked when the state machine is actually running. This means they only exist in the Concrete HKT.

Note that the model passed to the postcondition function is the one before the command executes. It is often useful to apply the transition to the model when evaluating the response, as I do below:

postcondition :: Model Concrete -> Command Concrete -> Response Concrete -> Logic
-- after a push, assert the pushed element is at the head of the new model
postcondition mod cmd@(Push x) resp = x .== head m'
  where Model m' = transition mod cmd resp
-- after a pop, assert that the popped element is at the end of the old model 
postcondition (Model m) Pop (Popped x) = x .== if null m then Nothing else Just $ last m
-- the length of the model and the length of the SUT should always be aligned
postcondition (Model m) AskLength (TellLength x) = length m .== x
Enter fullscreen mode Exit fullscreen mode

Invariants

Invariants take a model and assert that the model is always in a certain state, irrespective of the command and response. Invariants also run after every step in the state machine, which makes them expensive to run. Because of this, quickcheck-state-machine uses a Maybe to allow for no invariants to be returned.

As there is no invariant behavior I want to see in this model, I can return Nothing:

invariant = Nothing
Enter fullscreen mode Exit fullscreen mode

Generator

The generator is one of the places that quickcheck-state-machine really shines. You create generators using QuickCheck combinators, so any existing QuickCheck custom combinators can be repurposed for quickcheck-state-machine.

Here, I use the oneof combinator, which generates commands with a uniform distribution. Because I am in the command generation phase, the Symbolic HKT is used:

generator :: Model Symbolic -> Maybe (Gen (Command Symbolic))
generator _ = Just $ oneof [(pure Pop), (Push <$> arbitrary), (pure AskLength)]
Enter fullscreen mode Exit fullscreen mode

Shrinker

Like in QuickCheck, the shrinker takes a value and returns an array of new values to test. Most QuickCheck programs never use the shrinker directly, but here, I use it to specify what does and doesn't need to be shrunk. This allows the generation to move fast through values that have no logical relationship.

For example, below, I only apply the shrinker to numbers pushed onto the stack, as I want to test if the size of the numbers matters. In all other places, there is no shrinker used:

shrinker :: Model Symbolic -> Command Symbolic -> [Command Symbolic]
shrinker _ (Push x) = [ Push x' | x' <- shrink x ]
shrinker _ _  = []
Enter fullscreen mode Exit fullscreen mode

Semantics

Semantics take a command using the Concrete HKT, which signifies that it is used only when the tests are actually executing, and returns the result of the execution in the monadic context (here IO).

semantics :: String -> Command Concrete -> IO (Response Concrete)
semantics fname (Push x) = do
    pushToQueue fname x
    return Pushed
semantics fname Pop = do
    val <- popFromQueue fname
    return $ Popped val
semantics fname AskLength = do
    val <- lengthQueue fname
    return $ TellLength val
Enter fullscreen mode Exit fullscreen mode

Mock

The purpose of Mock is to generate dummy responses when the state machine is in command generation mode (thus the Symbolic HKT). It is the foil to Semantics, which creates Concrete responses from real commands during text execution mode. The content of the mock responses is thrown away, as all the library uses mock for is to create a Response used to effectuate a transition between states.

One nice thing about mock is that, if you want to, you can create a full-fledged mock of your model, and this can be useful if you'd like to use SPBT to generate (Comamnd Response) pairs that can be used to induce a spec of the model (ie to induce a JSON schema or an OpenAPI spec).

mock :: Model Symbolic -> Command Symbolic -> GenSym (Response Symbolic)
mock _ (Push _) = pure Pushed
mock _ Pop = pure $ Popped Nothing
mock _ AskLength = pure $ TellLength 0
Enter fullscreen mode Exit fullscreen mode

Cleanup

Cleanup, like the semantics, exists within the monad that the system is executing. It's called after each series of commands is executed.

Since I don't need any cleanup for our queue, I can write an empty function in the IO monadic context:

cleanup :: model Concrete -> IO ()
cleanup _ = return ()
Enter fullscreen mode Exit fullscreen mode

Building the state machine

Now that I have all of the ingredients, I can build my state machine.

Because the system under test takes one argument, I also pass that argument to the state machine.

sm :: String -> StateMachine Model Command IO Response
sm s = StateMachine initModel transition precondition postcondition
      invariant generator shrinker (semantics s) mock cleanup
Enter fullscreen mode Exit fullscreen mode

Testing

Now for the fun part, let's run my tests!

First, I want each test to execute in its own FIFO queue, which means a different file for each queue. I chose the pcg unique random number generator to accomplish this. This guarantees that each number generated will be unique during the run of a program.

newRand :: IO Int
newRand = do
  g <- create
  i <- uniform g
  return i
Enter fullscreen mode Exit fullscreen mode

Then, I define the property.

forAllCommands uses a state machine in its first argument to generate the commands. This is a state machine that will only run for the Symbolic HKT, not the Concrete HKT, so it won't ever touch IO. The next argument is a lower bound for the number of commands in a sequence. I use Nothing for the lower bound, meaning no lower bound.

The last argument to forAllCommands is a function that accepts a sequence of commands and returns a monadic property. Monadic properties are defined in QuickCheck.Monadic and can be used whenever a property exists in a monadic context. The convenience method monadicIO can be used to define properties in the IO context, and run lifts the result of the monadic execution to the PropertyM context. PropertyM is QuickCheck's monad transformer, and in our case we are transforming the IO monad. To learn more about monadic transformations, mtl is a great place to start.

So, the sequence below is:

  1. Create a new random number and lift it to the PropertyM monadic context.
  2. Create a file name using this number.
  3. Create a state machine using this filename.
  4. Call runCommands from quickcheck-state-machine, which already executes in the PropertyM context, so there is no need to prefix it with run.
  5. Use quickcheck-state-machine's pretty printer prettyCommands on the histogram generated by run result.

state_machine_properties :: Property
state_machine_properties = forAllCommands (sm "") Nothing $ \cmds -> monadicIO $ do
  id <- run newRand
  let fname = "queues/queue" <> (show id) <> ".txt"
  let sm' = sm fname
  (hist, _model, res) <- runCommands sm' cmds
  prettyCommands sm' hist (checkCommandNames cmds (res === Ok))
Enter fullscreen mode Exit fullscreen mode

Lastly, I execute the test and create the queues directory if it doesn't exist yet:

main :: IO ()
main = do
    createDirectoryIfMissing False "queues"
    quickCheck state_machine_properties
Enter fullscreen mode Exit fullscreen mode

When I run stack test from the command line, I see the following:

quickcheck-state-machine-tutorial> test (suite: quickcheck-state-machine-tutorial-test)

+++ OK, passed 100 tests.

Commands (264 in total):
33.7% Pop
33.7% Push
32.6% AskLength

quickcheck-state-machine-tutorial> Test suite quickcheck-state-machine-tutorial-test passed
Completed 2 action(s).
Enter fullscreen mode Exit fullscreen mode

And voila! Our tests pass.

GitHub repo

All of this is on the github repo 'meeshkan/quickcheck-state-machine-example'.

Conclusion

Stateful property based testing is a great way to find bugs in stateful systems. SPBT exists in several other frameworks as well:

I hope you find this technique useful! If you'd like your repos to benefit from automatic SPBT, I will take this opportunity to shamelessly plug the Meeshkan alpha on meeshkan.com.

Follow up exercises

Here are three follow up exercises you can do to expand your understanding of quickcheck-state-machine and SPBT!

Novice

Let's create a bug in the queue!

In the implementation of the FIFO queue, instead of adding the number to the head via show x : split, add it to the tail using split ++ [show x]. See if it's caught.

Intermediate

Create a new bug where the queue stops accepting new values once there are 50 values.

You can do this in the pushToQueue function. If you run the tests as-is, you won't find it. There are three ways you can find the bug:

  • Increase the number of times QuickCheck runs
  • Change the generator so that the frequency of push is greater than the frequency of pop. For inspiration, check out QuickCheck generator combinators and see if there is one that allows certain outcomes to happen with greater frequency than others.
  • Change the lower bound in forAllCommands from Nothing to something a bit higher.

This may take a long time to run depending on your parameters because of the shrinker. The shrinker will try to find a specific range of values to produce the bug, but because the bug is not linked to the specific value, it won't be able to meaningfully shrink.

For example, here is an excerpt of console output that would happen if you are able to provoke the bug. Here, I see that the postcondition for AskLength failed at the barrier of 50 results in the queue.

Model [+0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0]

   == AskLength ==> TellLength 50 [ 0 ]

Model [0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0
      ,0]

PostconditionFailed "PredicateC (51 :/= 50)" /= Ok
Enter fullscreen mode Exit fullscreen mode

Advanced

In the implementation of the queue, we use the funtion withFile for all file-based IO. Haskell also has the functions writeFile and readFile. Try using these instead and you'll hit a nasty bug!

Can you anticipate what the bug will be? Once you run into the bug, was your guess right? How does this bug show the difference between withFile vs writeFile and readFile?

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .