One of the trickiest things about automated testing is dealing with I/O. Automated tests that call APIs across a network, write to databases, or otherwise read/write to external sources can be cumbersome to set up, flaky, and slow.
Often these issues are addressed by mocking, but that has its own pain points as well. Mocks can take a lot of time to write, can have bugs, and might obscure issues in the real implementation.
In this article I would like to talk about a pattern that I have found very helpful when writing automated tests, and which I'm going to call Fetch, Process, Store. I'm sure others have discovered the same idea and given it other (probably better) names, but I'm going to roll with that one.
Fetch, Process, Store
The main idea behind this technique is that most operations consist of at least one of the following:
- Fetching data from external sources (databases, HTTP APIs, etc).
- Processing the data.
- Storing/presenting the result
If we separate those three phases into three sequential, separate steps, it can make our code a lot more testable.
Doing all the things
Here's an example of a function which does all the things. It takes in a URL, fetches CSV data from that URL, gets all of the rows that contain a given string, and writes them out to a file:
import fetch from 'node-fetch';
import fs from 'fs';
import { parse as csvParse } from 'csv-parse/sync';
import { stringify as csvStringify } from 'csv-stringify/sync';
async function fetchAndProcessAndStoreCsv(
inFileUrl,
searchString,
outFilePath,
) {
const inFileRes = await fetch(inFileUrl);
if (!inFileRes.ok) {
throw new Error(`Unexpected status ${inFileRes.status} from ${inFileUrl}`);
}
const rawCsv = await inFileRes.text();
const csvRows = csvParse(rawCsv);
const matchingRows = csvRows.filter(
cols => cols.some(val => val.includes(searchString)),
);
const outFileRaw = csvStringify(matchingRows);
await fs.promises.writeFile(outFilePath, outFileRaw);
}
This function is going to be a pain to write a test for. We're going to have to give it a URL and hope we have connectivity, then we're going to have to read a file and make sure it contains what we expect. That might look something like this in mocha:
import assert from 'assert';
import fetch from 'node-fetch';
import fs from 'fs';
import { parse as csvParse } from 'csv-parse/sync';
describe('CSV data filtering tests', () => {
it('Can fetch a CSV, filter rows, and write them to disk', async() => {
const url = 'https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv'; // Titanic passenger data set
const searchString = 'Mrs.';
const outFilePath = './test_output.csv';
await fetchAndProcessAndStoreCsv(url, searchString, outFilePath);
const rawCsv = fs.readFileSync(outFilePath, 'utf8');
const csvRows = csvParse(rawCsv);
const rowsContainingSearchString = csvRows.filter(
cols => cols.some(val => val.includes(searchString)),
);
assert(rowsContainingSearchString.length > 0);
assert(rowsContainingSearchString.length === csvRows.length);
});
});
Fetch, Process, Store
Here's the same CSV fetch/process/store logic separated out into four separate functions:
import fetch from 'node-fetch';
import fs from 'fs';
import { parse as csvParse } from 'csv-parse/sync';
import { stringify as csvStringify } from 'csv-stringify/sync';
async function doFetch(inFileUrl) {
const inFileRes = await fetch(inFileUrl);
if (!inFileRes.ok) {
throw new Error(`Unexpected status ${inFileRes.status} from ${inFileUrl}`);
}
return inFileRes.text();
}
function process(rawCsv, searchString) {
const csvRows = csvParse(rawCsv);
const matchingRows = csvRows.filter(
cols => cols.some(val => val.includes(searchString)),
);
return csvStringify(matchingRows);
}
function store(outputFilePath, outFileRaw) {
return fs.promises.writeFile(outputFilePath, outFileRaw);
}
async function fetchAndProcessAndStoreCsv(
inFileUrl,
searchString,
outFilePath,
) {
const csvData = await doFetch(inFileUrl);
const filteredCsvData = process(csvData, searchString);
await store(outFilePath, filteredCsvData);
}
We still have the function that does all the things (fetchAndProcessAndStoreCsv()
), but the details are broken into three separate steps which are independently testable. Here are some tests for them:
import assert from 'assert';
import fetch from 'node-fetch';
import fs from 'fs';
import { parse as csvParse } from 'csv-parse/sync';
describe('CSV data filtering tests', () => {
it('(FETCH) Can fetch a CSV', async () => {
const url = 'https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv'; // Titanic passenger data set
const csvData = await doFetch(url);
assert(typeof csvData === 'string');
});
it('(PROCESS) Can return only the rows which contain a certain substrings', async () => {
const rawCsvData = `
X,Y,Z
A,B,C
`.trim();
const filtered = await process(rawCsvData, 'Y');
assert(filtered === 'X,Y,Z\n');
});
it('(STORE) Can store data as a file', async () => {
const outFilePath = './test_output.csv';
await store(outFilePath, 'X,Y,Z');
const fileContents = await fs.readFileSync(outFilePath, 'utf8');
assert(fileContents === 'X,Y,Z');
});
});
You could also add an end-to-end test which calls the fetchAndProcessAndStoreCsv
function, just like the original test.
Hey wait a second, we're testing the same thing, but with three times as many tests. What's so great about that?
Imagine you want to write more tests, for example you might want to test how your code behaves if the CSV content is an empty string, or if the last line is empty. It's dead simple to write more tests that call the process()
function. On the other hand, in order to use the fetchAndProcessAndStoreCsv()
function to test these cases, you would have to go and upload test CSV files to some place that they can be fetch()ed
from. That would be painful.
Further, you can choose which tests to run. For example Mocha supports a --grep
option to match which tests to run. In this case, we could test all of our PROCESS
functions by running mocha --grep PROCESS
(since they have (PROCESS)
in their description). This is a nice thing to be able to do, especially if you want to run a test suite as a git pre-commit hook. A thousand tests that don't do any I/O might finish almost instantly, while a thousand tests that make HTTP calls and talk to databases might take quite a while, and might fail randomly.
Next, this approach lets you more intelligently cut corners in your tests. For any project, you have to make a choice about whether writing I/O tests is worth your time and effort (and the flakiness and high test duration). Depending on the project, maybe the answer is "yes". But if it's "no", the Fetch, Process, Store pattern makes it easy to skip the I/O tests, while still having good test coverage of your business logic.
Finally, this approach lets you avoid some of the pain points of mocking and dependency injection. Those are valid approaches and do have their place, but I find that the Fetch, Process, Store pattern is simpler and involves less throwaway code in most cases.
When it gets more complicated
The Fetch, Process, Store pattern works best when you're able to fetch all of the data up front, process it, and store it, in that order.
But there are times when you need to fetch something, do processing on it, then fetch something else based on the results of that processing.
You can still use this pattern in that case, by stretching it into more steps: Fetch, Process, Fetch, Process, Store. The important thing is to separate the fetch, process, and store steps, even if it requires making multiples of each.
Writing end-to-end tests does get more difficult in these situations, and mocking might be a better solution in cases like this.
Conclusion
Writing automated tests is an art, and if you don't design your code with testability in mind, it can be hell. The Fetch, Process, Store pattern is one such way to design your code for easy testability, and I have found it very valuable for achieving good test coverage with minimal effort.