In this post, I'll show you how to deploy a very simple Python web app with very long tests. I'll also show you how to speed up those tests significantly using parallelism.
If you are already familiar with Heroku and just want to go straight to the point, go directly to part 3.
It's testing time.
For my last project, (a web scraping API) we decided to have part of our infrastructure on Heroku. The reason was simple: neither my co-founder nor I were very good at the ops side of dev, so we have chosen the simplest, most time-efficient way to deploy our app: Heroku. Prior to this we'd had middling experience with AWS, in particular EBS.
Make no mistake – this simplicity comes at a price, and Heroku is crazy expensive. But their free plan is very good for side projects such as a Twitch SMS notificator 😎.
So as I said, I've been using Heroku for quite a bit of time. And since the beginning we used the lightweight but simple CI integration that would automatically deploy our application every time we push, if and only if all our tests pass.
Nothing new under the sun here.
In this post you will see how to easily deploy a Heroku application and set up the continuous integration. But more importantly, you will see how to parallelize tests. Again, if you are already familiar with how to deploy a Heroku application and the continuous application, go directly here to learn about parallelising the test.
First, deploy an app on Heroku:
If you don't already, you need to create a Heroku account . You also need to download and install the Heroku client.
I've provided a test project on Github, so do not hesitate to check it out it you need help bootstrapping this tutorial.
You can pull this repo, cd
into it and just do a heroku create --app <app name>
. Then if you go onto your app dashboard, you'll see your new app.
OK, now comes the interesting part – just go onto your dashboard and click on the name of your newly created app, then go to the "deploy" panel.
We will now link this Heroku app with your Github repo. This is rather easy: simply click on "Github" in the "Deployment method" section, add your repo in the "App connected to Github" section, and don't forget to click "Enable automatic deploys" in the "Automatic deploys" section.
Once everything is setup it should look a little bit like this:
If you go over "Settings -> Domains" you should see the domain where your app is live.
So now your app is live, and every-time you'll push to Github a new deploy will take place.
Then add tests and CI:
In order to run tests on Heroku you have to do to is click on "Wait for CI to deploy" in the deploy section of your app.
You also need to add your application to a Heroku pipeline.
Doing this is really easy: just go on the Deploy tab of your application and create a new Pipeline with the name of your choice.
You have now access to the Pipeline view where you can click on your previously deployed app.
Go over to the Tests tab, link your Github repo, and click on "Enable Heroku CI". Be aware, this option costs $10 a month.
Let's go back to our code. The test file is already written, and now, all you have to do to trigger the magic is simply to push to master.
git commit --allow-empty -m "Trigger heroku" && git push origin master
And now, the app won't deploy right away – Heroku will wait for tests to pass before deploying. You can check what's going on behind the curtain on the Test tab.
The command that is run during the test is defined in the app.json
file.
As you can see, tests are now being run sequentially on Heroku. If you look at the slow-tests.py
file, you will see that I defined my tests using pytest.mark.parametrize
that allows me to trigger multiple tests in one line:
pytest.mark.parametrize("wait_time", [5] * 20)
def test_slow(wait_time):
time.sleep(wait_time)
assert True
This decorator means that the test will be run 20 times with wait_time=5
.
As you can see in Heroku, this test suite is (artificially) rather slow:
Parallelising test on Heroku
As stated here in the doc, Heroku easily offers the ability to parallelise tests. In order to launch your tests on multiple dynos at the same time, you just have to tweak your app.json
file a little bit.
{
"environments": {
"test": {
"scripts": {
"test-setup": "pip install -r requirements.txt",
"test": "pytest --tap-stream slow-tests.py"
},
"formation": {
"test": {
"quantity": 12
}
}
}
},
"buildpacks": [{ "url": "heroku/python" }]
}
The quantity
key will tell Heroku on how many dynos you want to run your test. From now on, pushing on master will launch the test on 12 dynos. But stopping here won't make your tests faster because the entire test suite will be run on 12 dynos. What we want is to run 1/12 of all tests on each of the 12 dynos.
It is actually easy to check:
Tests were run on 12 dynos, but were not that much faster. So now comes the tricky and unfortunately not very well documented part: how do we tell Heroku to run 1/12 of the test suite on each of the 12 dynos?
Splitting up tests
To do this we will use 2 environment variables set by Heroku and accessible on each dyno, CI_NODE_TOTAL
and CI_NODE_INDEX
. The first one indicates the total number of the dynos on which the tests are run, and the second one indicates the current dyno are you.
Let's see how to use them. pytest offers you the ability to overwrite the test items that are going to be executed during the test phase. To overwrite this function, just declare this snippet of code in conftest.py
file:
import os
def pytest_collection_modifyitems(items, config):
ci_node_total = int(os.getenv("CI_NODE_TOTAL", 1))
ci_node_index = int(os.getenv("CI_NODE_INDEX", 0))
items[:] = [
item
for index, item in enumerate(items)
if index % ci_node_total == ci_node_index
]
This method is used to modify test items that are going to be tested in place. This method does not return anything, which is why you have to update the array in place. This usually an example of what not to do, but that is not the subject of this post.
You have to keep in mind that this snippet is run on every test node. On every test node, CI_NODE_TOTAL
is the same and CI_NODE_INDEX
is different, so by only keeping tests whose index in items modulo CI_NODE_TOTAL
equals CI_NODE_INDEX
we ensure 2 things:
- every node runs 1 /
CI_NODE_TOTAL
number of tests - every test originally in items ended up being run`
If it is not clear, imagine that I have 24 tests in items: [t1, t2, ...., t24]
. This snippet of code, executed on the number 1, will update the items variable such that, at the end of pytest_collection_modifyitems
, we have items = [t1, t13]
. Then in dyno number 2 we have items = [t2, t14]
, and so on.
And here is what happens on Heroku once we push:
As you can see, we did not manage to divide the time by 12. The reason is simple: each dyno takes about 30 seconds to boot, and this time is incompressible. But we managed to divide time by 2, and more importantly, we can parallelize our tests to up to 32 dynos, so there is plenty of room for time improvement.
Thank you for reading
I had trouble finding documentation about parallelising tests on Heroku in Python, and I really hope you liked this post and that it will speed up your deployment time on Heroku. All source code is freely available here on Github.
I frequently blog about Python and web scraping. Actually, I recently wrote a Python web-scraping guide that got some nice attention from Reddit 😎, so don't hesitate to check it out.
You can follow me here on Twitter so you don't miss any of my future blog posts.