We often joke that the only thing harder than building a distributed system is testing it. And as with most jokes, there is a truth at its core. Distributed systems are difficult to test because their complexity grows in two directions: the complexity of concurrently moving parts is compounded by the unreliability of the distribution medium itself, often leading to failure circumstances that are as difficult to predict as they are to reproduce.
Property-based testing is the method of repeated execution of a function over a wide range of (possibly random) inputs in order to find cases where a specified property is violated. It can be further enhanced with a refinement process that minimizes the failing input range. This method is often helpful in discovering earlier the sort of edge-case bugs that would otherwise only crop up in production.
When it comes to testing the properties of Wallaroo, a framework for building distributed data applications, the idea of combining property-based testing methods with end-to-end tests seemed liked a natural fit. On the one hand, we can leverage the property-based tests to show that Wallaroo's core properties hold over a broad range of application designs and system events (such as cluster scaling, network errors, and process crashes). And on the other hand, the end-to-end tests help show that these applications do, in fact, work in a variety of real-world conditions.
This presentation will focus on the challenges and benefits of using property-based testing in the end-to-end tests of distributed systems like Wallaroo. It will walk through the infrastructure and automation requirements, the signal design theory underpinning the tests, the active cycle of generation-execution-refinement for obtaining minimal reproducing test cases, and finally the results and lessons learned in the process.