Anatomy of a high-velocity CI/CD pipeline

RJ Zaworski - Dec 2 '21 - - Dev Community

If you’re going to optimize your development process for one thing, make it speed. Not the kind of speed that racks up technical debt on the team credit card or burns everyone out with breathless sprints, though. No, the kind of speed that treats time as your most precious resource, which it is.

Speed is the startup’s greatest advantage. Speed means not wasting time. Incorporating new information as soon as it’s available. Getting products to market. Learning from customers. And responding quickly when problems occur. But speed with no safeguards is simply recklessness. Moving fast requires systems for ensuring we’re still on the rails.

We’ve woven many such systems into the sociotechnical fabric of our startup, but maybe the most crucial among them are the continuous integration and continuous delivery processes that keep our work moving swiftly towards production.

The business case for CI/CD

Writing in 2021 it’s hard to imagine building web applications without the benefits of continuous integration, continuous delivery, or both. Running an effective CI/CD pipeline won’t score points in a sales pitch or (most) investor decks, but it can make significant strategic contributions to both business outcomes and developer quality of life. The virtuous cycle goes something like this:

  • faster feedback
  • fewer bugs
  • increased confidence
  • faster releases
  • more feedback (even faster this time)

Even on teams (like ours!) that haven’t embraced the dogma (or overhead) of capital-A-Agile processes, having the confidence to release early and often still unlocks shorter development cycles reduces time to market.

As a developer, you’re probably already bought into this idea. If you’re feeling resistance, though, here’s a quick summary for the boss:

Graphic illustrating the business case for continuous integration and delivery: feedback, quality, confidence, velocity.

The business case for continuous integration and delivery

Is CI/CD worth the effort?

Nobody likes a red build status indicator, but the truth is that builds fail. That’s why status dashboards exist, and a dashboard glowing crimson in the light of failing builds is much, much better than no dashboard at all.

Still, that dashboard (nevermind the systems and subsystems it’s reporting on) is pure overhead. Not only are you on the hook to maintain code and release a dozen new features by the end of the week, but also the litany of scripts, tests, configuration files, and dashboards needed to build, verify, and deploy it. When the server farm of Mac Minis in the basement hangs, you’re on the hook to restart it. That’s less time available to actually build the app.

This is a false dilemma, though. You can solve this problem by throwing resources at it. Managed services eliminate much of the maintenance burden, and when you’ve reached the scale where one-size-fits-all managed services break down you can likely afford to pay a full-time employee to manage Jenkins.

So, there are excuses for not having a reliable CI/CD pipeline. They just aren’t very good ones. The payoff — in confidence, quality, velocity, learning, or whatever you hope to get out of shipping more software — is well worth any pain the pipeline incurs.

Yes, even if it has to pass through XCode.

A guiding principle

Rather than prescribing the ultimate CI/CD pipeline in an edict from on-high, we’ve taken guidance from one of our team principles and evolved our practices and automation from there. It reads:

Ship to Learn. We release the moment that staging is better than prod, listen early and often, and move faster because of it.

Continuous integration is a big part of the story, of course, but the same guidance applies back to the pipeline itself.

  1. Releasing the moment that staging is better than prod is easy to do. This is nearly always the case, and keeping up with it means having both a lightweight release process and confidence in our work. Individual investment and a reasonably robust test suite are all well and good; better is having a CI/CD pipeline that makes them the norm (if not the rule).
  2. Listening early and often is all about gathering feedback as quickly as we possibly can. The sooner we understand whether something is working or not, the faster we can know whether to double down or adapt. Feedback in seconds is better than in minutes (and certainly better than hours).
  3. Moving faster includes product velocity, of course, but also the CI/CD process itself. Over time we’ve automated what we reasonably can; still, several exception-heavy stages remain in human hands. We don’t expect to change these soon, so here “moving fast” means enabling manual review and acceptance testing, but we don’t expect to replace them entirely any time soon.

So, our pipeline

Product velocity depends on the pipeline that enables it. With that in mind, we’ve constructed our pipeline to address the hypothesis that issues uncovered at any stage are more exponentially expensive to fix than those solved at prior stages. Issues will happen, but checks that uncover them early on drastically reduce friction at the later, more extensive stages of the pipeline.

Here’s the boss-friendly version:

A CI/CD pipeline and the time required to test at each stage

Test early, test often

Local development

Continuous integration starts immediately. If you disagree, consider the feedback time needed to integrate and test locally versus anywhere else. It’s seconds (rebasing against our main branch or acting on feedback from a pair-programming partner), minutes (a full run of our test suite) or less.

We’ve made much of if automatic. Our editors are configured to take care of styles and formatting; TypeScript provides a first layer of testing; and shared git hooks run project-specific static checks.

One check we don’t enforce is to run our full test suite. Run time goes up linearly with the size of a test suite, and — while we’re culturally averse to writing tests for their own sake — running our entire suite on every commit would be prohibitively expensive. What needs testing is up to individual developers’ discretion, and we avoid adding redundant or pointless tests to the test suite just as we avoid redundant test runs.

Make it fast, remember? That applies to local checks, too. Fast checks get run. Slow checks? No-one has time for that.

Automated CI

Changes pushed from local development to our central repository trigger the next layer of checks in the CI pipeline. Feedback here is slower than in local development but still fairly fast, requiring about 10 minutes to run all tests and produce a viable build.

Here’s what it looks like in Github:

Screenshot of automated tests passing in Github’s UI

Green checks are good checks.

There are several things going on here: repeats of the linting and static analysis run locally, a run through our completed backend test suite, and deployment of artifacts used in manual QA. The other checks are variations on this theme—different scripts poking and prodding the commit from different angles to ensure it's ready for merging into main. Depending on the nature of the change, we may require up to a dozen checks to pass before the commit is greenlit for merge.

Peer review

In tandem with the automated CI checks, we require manual review and sign-off before changes can be merged into main.

“Manual!?” I hear the purists cry, and yes — the “M” word runs counter to the platonic ideal of totally automated CI. Hear me out. The truth is that every step in our CI/CD pipeline existed as a manual process first. Automating something before truly understanding it is a sure path to inappropriate abstractions, maintenance burden, and at least a few choice words from future generations. And it doesn’t always make sense. For processes that are and always will be dominated by exceptions (design review and acceptance testing, to pick two common examples) we’ve traded any aspirations at full automation for tooling that enables manual review. We don’t expect to change this any time soon.

Manual review for us consists of (required) code review and (optional) design review. Code review covers a checklist of logical, quality, and security concerns, and we (plus Github branch protection) require at least two team members to believe a change is a good idea before we ship it. Besides collective ownership, it’s also a chance to apply a modicum of QA and build shared understanding around what’s changing in the codebase. Ideally, functional issues that weren’t caught locally get caught here.

Design review

Design review is typically run in tandem with our counterparts in product and design, and aims to ensure that designs are implemented to spec. We provide two channels for reviewing changes before a pull request is merged:

  1. preview builds of a “live” application that reviewers can interact with directly
  2. storybook builds that showcase specific UI elements included within the change

Both the preview and storybook builds are linked from Github’s pull request UI as soon as they’re available. They also nicely illustrate the type of tradeoffs we’ve frequently made between complexity (neither build is trivial to set up and maintain), automation (know what would be trickier? Automatic visual regression testing, that’s what) and manual enablement (the time we have decided to invest has proven well worth it).

The bottom line is that — just like with code review — we would prefer to catch design issues while pairing up with the designer during initial development. But if something slipped through, design review lets us respond more quickly than at stages further down the line.

The feedback from manual review steps is still available quickly, though: generally within an hour or two of a new pull request being opened. And then it’s on to our staging environment.

Continuous delivery to staging

Merging a pull request into our main branch finally flips the coin from continuous integration to continuous delivery. There's one more CI pass first, however: since we identify builds by the commit hash they're built from, a merge commit in main triggers a new CI run that produces the build artifact we deliver to our staging environment.

The process for vetting a staging build is less prescriptive than for the stages that precede it. Most of the decision around how much QA or acceptance testing to run in staging rests with the developer on-call (who doubles as our de-facto release manager), who will review a list of changes and call for validation as needed. A release consisting of well-tested refactoring may get very little attention. A major feature may involve multiple QA runs and pull in stakeholders from our product, customer success, and marketing teams. Most releases sit somewhere in the middle.

Every staging release receives at least passing notice, for the simple reason that we use Koan ourselves — and specifically, an instance hosted in the staging environment. We eat our own dogfood, and a flavor that’s always slightly ahead of the one our customers are using in production.

Staging feedback isn’t without hiccups. At any time we’re likely to have 3–10 feature flags gating various in-development features, and the gap between staging and production configurations can lead to team members reporting false positives on features that aren’t yet ready for release. We’ve also invested in internal tooling that allows team members to adopt a specific production configuration in their local or staging environments.

Internal UI for forcing production configuration in staging environment — the design team loves this one.

The aesthetics are edgy (controversial, even), but the value is undeniable. We’re able to freely build and test features prior to production release, and then easily verify whether a pre-release bug will actually manifest in the production version of the app.

If you’re sensing that issues caught in staging are more expensive to diagnose and fix than those caught earlier on, you’d be right. Feedback here is much slower than at earlier stages, with detection and resolution taking up to several hours. But issues caught in staging are still much easier to address before they’re released to production.

Manual release to production

The “I” in CI is unambiguous. Different teams may take “integration” to mean different things — note the inclusion of critical-if-not-exactly-continuous manual reviews in our own integration process — but “I” always means “integration.”

The “D” is less straightforward, standing in (depending on who you’re talking to, the phase of the moon, and the day of the week) for either “Delivery” or “Deployment,” and they’re not quite the same thing. We’ve gained enormous value from Continuous Delivery. We haven’t made the leap (or investment) to deploy directly to production.

That’s a conscious decision. Manual QA and acceptance testing have proven tremendously helpful in getting the product right. Keeping a human in the loop ahead of production helps ensure that we connect with relevant stakeholders (in product, growth, and even key external accounts) prior to our otherwise-frequent releases.

Testing in production

As the joke goes, we test comprehensively: all issues missed by our test suite will be caught in production. There aren’t many of these, fortunately, but a broad enough definition of testing ought to encompass the instrumentation, monitoring, alerting, and customer feedback that help us identify defects in our production environment.

We’ve previously shared an outline of our cherished (seriously!) on-call rotation, and the instrumentation beneath it is a discussion for another day, but suffice to say that an issue caught in production takes much longer to fix than one caught locally. Add in the context-switching required from team members who have already moved on to other things, and it’s no wonder we’ve invested in catching issues earlier on!

Revising the pipeline

Increasing velocity means adding people, reducing friction, or (better yet) both. Hiring is a general problem. Friction is specific to the team, codebase, and pipeline in question. We adopted TypeScript to shorten feedback cycles (and save ourselves runtime exceptions and pagerduty incidents). That was an easy one.

A less obvious bottleneck was how much time our pull requests were spending waiting for code review — on average, around 26 hours prior to merge. Three and half business days. On average. We were still deploying several times per day, but with several days’ worth of work-in-process backed up in the queue and plenty of context switching whenever it needed adjustment.

Here’s how review times tracked over time:

Chart of code review time showing significant drop on March 2021

This chart is fairly cyclical, with peaks and troughs corresponding roughly to the beginning and end of major releases — big, controversial changes as we’re trailblazing a new feature; smaller, almost-trivial punchlist items as we close in on release day. But the elephant in the series lands back around March 1st. That was the start of Q2, and the day we added “Code Review Vitals” to our dashboard.

It’s been said that sunlight cures all ills, and simply measuring our workflow had the dual effects of revealing a significant bottleneck and inspiring the behavioral changes needed to correct it.

Voilá! More speed.

Conclusion

By the time you read this post, odds are that our CI/CD pipeline has already evolved forward from the state described above. Iteration applies as much to process as to the software itself. We’re still learning, and — just like with new features — the more we know, and the sooner we know it, the better off we’ll be.

With that, a humble question: what have you learned from your own CI/CD practices? Are there checks that have worked (or totally flopped) that we should be incorporating ourselves?

We’d love to hear from you!

. . . . . . .