Introduction to properties-driven development

Kimmo Sääskilahti - Apr 27 '20 - - Dev Community

This article was edited by Carolyn Stransky and inspired by the Property-Based Testing with PropEr, Erlang, and Elixir book.

Properties-driven development means the application of property-based testing to test-driven development. Test-driven development asks us to constantly write tests to ensure our code is easily testable and usable. Property-based testing asks us to write test case generators instead of hard-coded example inputs and outputs and ensure given properties hold for our code.

Thinking in terms of properties forces us to be very explicit about what our code can and cannot do. We're effectively adopting a design by contract approach, which helps us understand the problem we're trying to solve before diving into coding.

In this article, we'll learn what properties-driven development looks like. We'll also apply these principles to develop a module for a sorted dictionary.

What's in this tutorial

⚠️ Prerequisites:

  • A general understanding of software testing.
  • (Optional) Python 3+* if you want to follow along.

* This guide will use Python for code examples, but the concepts aren't limited to any specific programming language. So even if you don't know Python, we'd encourage you to read along anyway.

💻 References:

This GitHub repository contains all the featured code examples as tests. The repository also contains instructions for how to execute them.

What is properties-driven development?

As mentioned above, properties-driven development is test-driven development with property-based tests. Test-driven development asks us to think what our code should do and put that into a test. In the context of this article, it's not that important whether we write our tests before, after, or during the implementation. What's important is that tests guide the development. Property-based testing asks us to formulate those tests as properties.

For example, let's say we're writing code for converting a CSV into a JSON array. Instead of jumping into writing a CSV parser, test-driven development asks us to first come up with test cases.

Here's an example CSV input:

a,b,c
3,2,1
6,3,2
Enter fullscreen mode Exit fullscreen mode

This should be turned into the following JSON:

[
  { "a": 3, "b": 2, "c": 1 },
  { "a": 6, "b": 3, "c": 2 }
]
Enter fullscreen mode Exit fullscreen mode

This example would make a nice unit test for our code! However, the test would have quite a few assumptions baked in:

  • Keys are distinct.
  • Values are integers.
  • There are no missing values.

Even if we wanted to be thorough and write unit tests to cover each of those assumptions... our imaginations are limited. For example, we already forgot to mention the assumptions that the list of keys is non-empty and that there's at least one row in the CSV.

Property-based tests are excellent for forcing us to be explicit about our assumptions. To come up with properties for our CSV-to-JSON parser, we would need to first generate CSVs we expect our parser to be able to handle.

Here's the pseudocode for this type of generator:

  1. Generate keys: A list of arbitrary strings.
  2. Generate the number of rows: A non-negative integer.
  3. Generate rows: For each row and key, generate an arbitrary string value.

Can you see how many tricky cases our generator forces us to cover? The list of keys may be empty or the number of rows may be zero. We also assume our code works with arbitrary strings (including empty strings and strings containing commas). And that those strings can work as both keys and values. Creating a generator ensures that our CSV parser is pushed to its limits.

But this generator might turn out to be too demanding for our use case. If that happens, we can relax the generator to produce friendlier data. For example, we may find that our code crashes with empty input. If we know that empty inputs are filtered out elsewhere in the code, we can change the generator to produce only non-empty inputs and add an assertion in our code that the input is expected be non-empty. Instead of programming by coincidence, we've consciously made the decision what our code is expected to handle.

The generator we created is an example of a bottom-up approach to data generation. Instead of generating random CSVs directly, we generate them step by step and keep track of what goes in. So when the time comes to write an assertion that our code did the right thing, we know what the expected result is. This avoids the problem where we need to duplicate the possibly complex implementation in tests. For example, with our generator, we know the length of the resulting JSON array should be equal to the number of rows generated at step 2. That's a good property! If you want to dive deeper into how to apply property-based testing to CSV parsing, take a look at this chapter from the PropEr testing book.

With that, we should have an understanding of what properties-driven development is. Next, let's apply these principles in an example project.

Example project: Sorted dictionary

As an example project, we'll build a custom dictionary in Python. We'll call the data structure SortedDict and expect it to always keep its keys in a particular order. This type of sorted dictionary might be useful, for example, when you want to keep track of users sorted by birth date. By keeping keys sorted, we're able to traverse the sorted list of key-value pairs in linear time.

For our project, we'll keep the keys sorted by storing the key-value pairs in a binary search tree.

Note: Because the standard tree could be unbalanced, and therefore ineffective, you should use sortedcontainers in production software.

Implementing a sorted dictionary is a good example for properties-driven development for a few reasons. First, while property-based testing is most common in functional programming where data structures are immutable, the example shows it can be equally as useful for testing mutable objects. Second, while the implementation of the binary tree is straightforward, it's also complex enough to deserve good tests. Especially the deletion logic can be error-prone and hide complex bugs.

To implement the binary search tree, we'll resort to the reference algorithms from Introduction to Algorithms textbook.

For the sake of this article, we'll assume the keys are integers and that the keys themselves are used for comparison (instead of providing a custom callable per value like SortedDict in sortedcontainers does).

What should it do?

To come up with properties, we first need to come up with the requirements for our SortedDict. One way to come up with requirements is to play around with the API that we expect our module to have.

First, we expect that we can search for an inserted key:

>>> # Insert and search
>>> sorted_dict = SortedDict()
>>> sorted_dict[2] = 'two'
>>> sorted_dict[2]
'two'
Enter fullscreen mode Exit fullscreen mode

We also expect that keys are always sorted irrespective of the insertion order. Continuing on the previous example:

>>> # Keys are sorted
>>> sorted_dict[1] = 'one'
>>> sorted_dict.keys()
[1, 2]
Enter fullscreen mode Exit fullscreen mode

Like with a standard dictionary in Python, we expect re-inserting an existing key will result in the value being overwritten:

>>> # Handles duplicate keys
>>> sorted_dict[2] = 'two-two'
>>> sorted_dict[2]
'two-two'
Enter fullscreen mode Exit fullscreen mode

We expect that searching for a non-existing key will raise a KeyError:

>>> # Non-existing key
>>> sorted_dict[3]
Traceback (most recent call last):
KeyError: ...
Enter fullscreen mode Exit fullscreen mode

Note: ... represents the rest of the KeyError log.

Finally, if we delete a key, we expect that searching for it will also raise a KeyError:

>>> # Searching for deleted key
>>> del sorted_dict[1]
>>> sorted_dict[1]
Traceback (most recent call last):
KeyError: ...
Enter fullscreen mode Exit fullscreen mode

Listing requirements

Now that we have an idea of what we expect our code to do, we can try to generalize those previous examples into requirements:

  1. Key-value pairs added to our sorted dictionary can be searched.
  2. Adding a key that already exists overwrites the existing one.
  3. Keys are always sorted.
  4. Searching for a non-existing key raises a KeyError.
  5. Deleting a key and then searching for it raises a KeyError.

These requirements will serve as the basis for our properties.

Next, we'll write a property for the first requirement: Being able to insert key-value pairs into the dictionary and then search for them.

Implementing insert and search

Before writing the test case generators, we first add the basic skeleton for SortedDict:

# sorted_dict.py
class SortedDict:
    def __init__(self):
        """
        Sorted dictionary, keeps key-value pairs
        sorted by the key.
        """
        pass

    def __setitem__(self, key, item):
        """
        Add a key-value pair to the dictionary.
        """
        pass

    def __getitem__(self, key):
        """
        Get a key-value pair from the dictionary.
        """
        pass
Enter fullscreen mode Exit fullscreen mode

The __setitem__ method is called when sorted_dict[key] = item is invoked. Similarly, __getitem__ defines the behaviour for sorted_dict[key].

This skeleton doesn't implement setting or getting the keys yet. So once we're done writing the test, we can run it to make sure it fails. This gives us confidence our test works as expected.

Generator

We'll use the Hypothesis library for writing properties for our sorted dictionary. Hypothesis supports generating almost any kind of data you can imagine. It also provides decorators for writing property-based tests.

To get started with writing our first property-based test, we need a generator of key-value tuples. The entry point to generators in Hypothesis is the hypothesis.strategies module, which we alias as some below because it reads nicely. Strategies in Hypothesis are essentially "clever data generators" that you can compose to generate very complex data.

Here's how to compose generators in Hypothesis to produce a list of key-value tuples:

# test_sorted_dict.py
import hypothesis.strategies as some

def some_key_value_tuples():
    """
    Generator for lists of key-value tuples.
    """
    some_keys = some.integers()
    some_values = some.binary()
    some_kvs = some.tuples(some_keys, some_values)
    return some.lists(some_kvs)
Enter fullscreen mode Exit fullscreen mode

The function first defines a generator some_keys for keys, which we assume are integers. For values, we assume any binary is valid. We then define some_kvs, a generator of key-value tuples using some_keys and some_values as generators of keys and values, respectively. Finally, we return the generator some.lists(some_kvs), which generates lists of tuples.

To see what kind of data the generator creates, we can call some_key_value_tuples().example():

>>> some_key_value_tuples().example()
[]
>>> some_key_value_tuples().example()
[(53, b'{\x8b\xed\x92\xa8\xcb\x7fq\x95')]
>>> some_key_value_tuples().example()
[(-19443, b'\x16ERa'), (-425, b'')]
Enter fullscreen mode Exit fullscreen mode

Now that we have a generator for lists of key-value tuples, we want to build a generator generating instances of SortedDict containing those tuples. With Hypothesis, we can use hypothesis.strategies.composite for this purpose as follows:

# test_sorted_dict.py
@some.composite
def some_sorted_dicts(draw):
    """
    Generator of sorted dicts.
    """
    key_values = draw(some_key_value_tuples())

    sorted_dict = SortedDict()
    for key, val in key_values:
        sorted_dict[key] = val
    return sorted_dict
Enter fullscreen mode Exit fullscreen mode

The @some.composite decorator injects the draw function into the decorated function definition. The draw function can be used to sample values from another generator. Above, we sample a list of key-value tuples from the some_key_value_tuples() generator and then insert those key-value pairs into our SortedDict. Now some_sorted_dicts() is a data generator generating instances of SortedDict.

However, there's a problem: when we get an instance of SortedDict from the generator, we don't know anymore what data went into creating the sorted dictionary. We therefore also return the key-value pairs from the generator, with the pairs stored in a standard Python dictionary:

# test_sorted_dict.py
@some.composite
def some_sorted_dicts(draw):
    """
    Generator of sorted dicts. Returns a tuple of the sorted dictionary and a dictionary holding the key-value pairs used for data generation.
    """
    ... # define sorted_dict as as above

    expected = {}
    for key, val in key_values:
        expected[key] = val

    return sorted_dict, expected
Enter fullscreen mode Exit fullscreen mode

The variable expected contains the key-value pairs we expect to find from our sorted dictionary. The key-value pairs it contains act as a "model" for what our sorted dictionary should contain. Using a standard dictionary as the model is useful, because it handles duplicates in the same way we expect SortedDict to handle, i.e., by overwrite.

Property

With the test case generator some_sorted_dicts defined, we're ready to state our first property: Any keys inserted into the dictionary can be searched.

# test_sorted_dict.py
from hypothesis import given

@given(dict_and_values=some_sorted_dicts())
def test_insert_and_search(dict_and_values):
    """
    Key-value pairs added to sorted dictionary can be searched.
    """
    sorted_dict, expected = dict_and_values

    for key, value in expected.items():
        in_dict = sorted_dict[key]
        assert in_dict == value
Enter fullscreen mode Exit fullscreen mode

The test case is marked as a Hypothesis test with the @given function decorator. When this test is run, Hypothesis will generate 100 different sorted dictionaries and verify that all of the key-value pairs expected in the dictionary are found.

Note: 100 is the default number of satisfying test cases that will run before terminating in Hypothesis. You can change this value by adding a @settings object.

If we run the tests now, they should fail: we haven't implemented __setitem__ and __getitem__ yet. For those, we need the binary search tree.

Binary search tree

For the binary search tree, we'll use the implementation from the Introduction to Algorithms book.

Tree is defined as a dataclass as follows:

# tree.py
from dataclasses import dataclass
import typing as t

@dataclass
class Tree:
    """
    Binary search tree.
    """

    root: t.Optional[Node] = None

    def __repr__(self):
        return "Tree(root={})".format(repr(self.root))
Enter fullscreen mode Exit fullscreen mode

The @dataclass decorator adds, for example, an __init__ method for creating a tree with an optional root argument. If the tree is empty, root is equal to None. Otherwise, it contains a Node defined like this:

# tree.py

@dataclass
class Node:
    key: int
    value: t.Any
    parent: t.Optional["Node"] = None
    left: t.Optional["Node"] = None
    right: t.Optional["Node"] = None

    def __repr__(self):
        return "Node(key={}, value={}, left=({}), right=({})".format(
            self.key, repr(self.value), repr(self.left), repr(self.right)
        )
Enter fullscreen mode Exit fullscreen mode

A Node has a key and a value. It also contains references to its left and right children as well as its parent.

Note: parent isn't included in __repr__ to avoid infinite loops (printing a child then prints the parent, which prints the child, which prints the parent...).

Inserting a key and a value to the tree happens as follows:

# tree.py
def insert(tree: Tree, key, value):
    """
    Reference insertion algorithm from Introduction to Algorithms.
    Operates in-place.
    """
    y = None
    x = tree.root

    while x is not None:
        y = x
        if key < x.key:
            x = x.left
        elif key > x.key:
            x = x.right
        else:
            x.value = value
            return

    z = Node(key=key, value=value, parent=y)
    if y is None:
        tree.root = z
    elif z.key < y.key:
        y.left = z
    else:
        y.right = z
Enter fullscreen mode Exit fullscreen mode

Note: If key is equal to an existing key, the value in the corresponding node is updated in-place.

To search from the tree, we need to go down the tree until a match is found. Or, if no match is found, raise a KeyError:

# tree.py
def search(tree: Tree, key):
    node = _search_node(tree, key)
    return node.value


def _search_node(tree: Tree, key) -> Node:
    if tree.root is None:
        raise KeyError("Empty dictionary, can't find key {}".format(key))

    x = tree.root

    while x is not None:
        if key < x.key:
            x = x.left
        elif key > x.key:
            x = x.right
        else:
            return x

    raise KeyError("Key {} not found".format(key))
Enter fullscreen mode Exit fullscreen mode

Our search involves a helper function _search_node that searches a Node by key. We'll need this when implementing deletion later.

Putting it all together

With the tree supporting insert and search, we can use this tree in our SortedDict:

# sorted_dict.py
from . import tree

class SortedDict:
    def __init__(self):
        self._tree = tree.Tree()

    def __setitem__(self, key, item):
        tree.insert(self._tree, key, item)

    def __getitem__(self, key):
        return tree.search(self._tree, key)
Enter fullscreen mode Exit fullscreen mode

If we now run the property-based test for insertion and search, we should see it pass!

Now we have implemented the first requirement with a property-based test. Let's take a look at our remaining requirements:

  1. Key-value pairs added to our sorted dictionary can be searched.
  2. Adding a key that already exists overwrites the existing one.
  3. Keys are always sorted.
  4. Searching for a non-existing key raises a KeyError.
  5. Deleting a key and then searching for it raises a KeyError.

In this case, we'll also consider our second requirement implemented as we trust Hypothesis to have generated duplicate keys. We could work on the requirement of keeping keys sorted next, but we want to be sure the requirement also holds after deletions. So that's the next requirement we'll tackle: Deleting a key and then searching for it raises a KeyError.

Implementing deletion

The requirement states that searching for a deleted key raises a KeyError. But how can we put that into a property? One way is to use our some_sorted_dicts() generator from earlier and let Hypothesis draw one of the keys for deletion. We then delete that key and ensure searching raises a KeyError.

To draw a key to delete, we could use composite again to create a composite strategy. This strategy would yield a sorted dictionary, the dictionary of expected contents, and a key to delete. However, in this case, it makes more sense to use the built-in data strategy from Hypothesis. This strategy comes with the draw method and we can use it to sample a value from a generator during the test.

Here's how we can use data() to sample a key to delete from the list of inserted keys:

# test_sorted_dict.py

@given(
    dict_and_values=some_sorted_dicts(),
    data=some.data(),
)
def test_search_after_delete(dict_and_values, data):
    """
    Searching for a key after deleting that key raises a KeyError.
    """
    sorted_dict, expected = dict_and_values
    inserted_keys = list(expected.keys())
    some_key = some.sampled_from(inserted_keys)
    key_to_delete = data.draw(some_key, label="Key to delete")
    del sorted_dict[key_to_delete]
    with pytest.raises(KeyError):  # type: ignore
        sorted_dict[key_to_delete]
Enter fullscreen mode Exit fullscreen mode

This test first uses sampled_from to create a generator called some_key. When sampled, some_key randomly chooses one key from the list of inserted keys. We then use data.draw() to sample a key that we want to delete. The label argument is added so that Hypothesis can print a clearer error message if it finds a failing test case. Once a key is drawn, we delete the key and verify that searching for it raises a KeyError.

But there's a problem with our implementation: The list of inserted keys may be empty. We need to cover that case by ensuring the test only runs for non-empty dictionaries. We can do that by using filter with our generator:

# test_sorted_dict.py

@given(
    dict_and_values=some_sorted_dicts().filter(lambda drawn: len(drawn[1]) > 0),
    data=some.data(),
)
def test_search_after_delete(dict_and_values, data):
    ...
Enter fullscreen mode Exit fullscreen mode

Note: Throughout the rest of the tutorial, ... represents the rest of the function or class as it was previous written.

The filter ensures that the dictionary is non-empty.

With the property ready, we can now move on to the implementation. The delete operation in our test, del sorted_dict[key], translates to sorted_dict.__delitem__(key). So we need to define __delitem__ in SortedDict:

# sorted_dict.py
class SortedDict:
    ...

    def __delitem__(self, key):
        return tree.delete(self._tree, key)
Enter fullscreen mode Exit fullscreen mode

Again, we delegate the deletion to the tree with the tree.delete function. Because the deletion is somewhat tricky but can be copied from the Introduction to Algorithms book, we refer to tree.py in the accompanying repository for its implementation.

Having tackled deletion, our list of requirements now looks as following:

  1. Key-value pairs added to our sorted dictionary can be searched.
  2. Adding a key that already exists overwrites the existing one.
  3. Keys are always sorted.
  4. Searching for a non-existing key raises a KeyError.
  5. Deleting a key and then searching for it raises a KeyError

As our last example, we'll write a property-based test for the third requirement: Keys are always sorted. The property for final requirement (Searching for a non-existing key raises a KeyError) can be found in test_sorted_dict.py in the accompanying repository.

Ensuring keys are sorted

A property like "Keys are always sorted" is an invariant: Whatever operations are performed, it should remain true. Assuming SortedDict has a keys() method returning the list of keys, we can write a property-based test ensuring that all the instances of SortedDict generated by some_sorted_dicts() have sorted keys:

# test_sorted_dict.py
@given(dict_and_values=some_sorted_dicts())
def test_keys_sorted(dict_and_values):
    """
    Invariant: Keys in the sorted dictionary are sorted.
    """
    sorted_dict, _ = dict_and_values
    keys = sorted_dict.keys()
    assert keys == sorted(keys)
Enter fullscreen mode Exit fullscreen mode

The implementation for sorted_dict.keys() uses in-order walk to traverse the tree. The implementation can be found in sorted_dict.py and tree.py in the accompanying repository.

The previous property ensures that keys are always sorted after they are inserted. But this property should also be checked after deletion. So, we should make sure that the invariant holds in our earlier deletion property:

# test_sorted_dict.py

@given(
    dict_and_values=some_sorted_dicts().filter(lambda drawn: len(drawn[1]) > 0),
    data=some.data(),
)
def test_search_after_delete(dict_and_values, data):
    ...
    keys = sorted_dict.keys()
    assert keys == sorted(keys)
Enter fullscreen mode Exit fullscreen mode

While this works, there's something unsatisfactory about it. The premise of property-based testing is that we can cover all kinds of unexpected cases. In test_keys_sorted and test_search_after_delete, we're hard-coding the cases where the invariant should be tested. Is there something we can do to randomize testing the invariant?

Enter stateful testing. With stateful tests, we define operations that can be run, but the framework decides the order.

With Hypothesis, we can use RuleBasedStateMachine for that. We won't go into details, but here's an example for SortedDict:

# test_sorted_dict.py
class StatefulDictStateMachine(RuleBasedStateMachine):
    def __init__(self):
        super().__init__()
        self.sorted_dict = SortedDict()
        self.state = {}

    inserted_keys = Bundle("inserted")
    deleted_keys = Bundle("deleted_keys")

    @rule(target=inserted_keys, key=some.integers(), value=some.text())
    def insert(self, key, value):
        event("Inserting key")
        self.sorted_dict[key] = value
        self.state[key] = value
        return key

    @rule(key=inserted_keys)
    def search(self, key):
        # A key inserted before may have already been
        # deleted if it was a duplicate, so searching it
        # may not succeed. Check the key exists in
        # the model dictionary.
        assume(key in self.state)
        event("Searching existing key")
        assert self.sorted_dict[key] == self.state[key]

    @rule(key=consumes(inserted_keys))
    def delete(self, key):
        assume(key in self.state)
        event("Deleting key")
        del self.sorted_dict[key]
        del self.state[key]

    @rule(key=some.integers())
    def search_non_existing(self, key):
        assume(key not in self.state)
        event("Searching non-existing key")
        with pytest.raises(KeyError):  # type: ignore
            self.sorted_dict[key]

    @invariant()
    def keys_sorted(self):
        keys = self.sorted_dict.keys()
        assert keys == sorted(keys)


TestStatefulDict = StatefulDictStateMachine.TestCase
Enter fullscreen mode Exit fullscreen mode

The methods defined in our RuleBasedStateMachine, decorated with @rule, define the commands the state machine runs. The decorator @invariant ensures that we check after every command if the keys are being sorted. We also keep a model of our current expected state in self.state to know what keys our SortedDict is expected to contain.

Note: Stateful tests, like the one above, can work wonders for revealing tricky bugs. However, they can be very hard to debug, so always try to cover as many cases as possible with regular unit and property-based tests.

Final touch: Add doctest

Remember how we played around with the expected API of our SortedDict to come up with the properties? For example, we assumed the following would hold:

>>> # Insert and search
>>> sorted_dict = SortedDict()
>>> sorted_dict[2] = 'two'
>>> sorted_dict[2]
'two'
Enter fullscreen mode Exit fullscreen mode

It would be great to ensure these smaller examples also hold for our implementation. Writing properties can be complex and error-prone, so it's valuable to have these human-readable examples to serve as anchors. They ensure that no matter what happens, our code works as expected at least with small example inputs.

These tests serve as documentation, which means they make excellent doctests. We can add the tests at the top of our sorted_dict.py module in a docstring:

# sorted_dict.py
"""
Sorted, mutable dictionary
keeping keys in sorted order.

Examples:

>>> # Insert and search
>>> sorted_dict = SortedDict()
>>> sorted_dict[2] = 'two'
>>> sorted_dict[2]
'two'
>>> # Handles duplicate keys
>>> sorted_dict[2] = 'two-two'
>>> sorted_dict[2]
'two-two'
...
"""
Enter fullscreen mode Exit fullscreen mode

In pytest, we can enable doctests with the --doctest-modules flag. Add the following in your pytest.ini to always run doctests:

# pytest.ini
[pytest]
addopts = --doctest-modules
# Ignore extraneous whitespaces and exception details.
doctest_optionflags= NORMALIZE_WHITESPACE IGNORE_EXCEPTION_DETAIL
Enter fullscreen mode Exit fullscreen mode

When pytest is run, it now also runs the examples from the documentation. And with that, we're done implementing the features we set out to implement for our SortedDict!

Conclusion

In this article, we learned how to apply property-based testing to test-driven development. We also used this principle to develop a sorted dictionary. Thinking in terms of properties instead of concrete examples can help us slow down and be precise about what our code is expected to do.

It's important to remember that property-based testing doesn't replace regular unit tests. Instead, it provides a new tool to our testing toolboxes. And sometimes, property-based testing isn't the right tool for the job. For an interesting article on when property-based testing shines, take a look at this article by Brujo Benavides. However, the more experienced we get at writing test case generators and property-based tests, the more natural it becomes to use them for testing almost any kind of code.

Thank you for reading! As always, we're happy to hear your feedback.

Additional resources

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .