Think of a world where you don't need a separate testing environment, where you can test everything in production and capture valuable data that helps you improve along the way. The secret ingredient: feature flags.
If you can't decide if testing in production is a foolish or a genius idea, this tutorial will definitely help.
What are feature flags?
Features flags is a software engineering technique that lets developers integrate code constantly into the main trunk. It involves shipping incomplete features into production, which remain dormant until ready. Feature flags also take part in software delivery; when the feature is complete, the code can be activated at the flick of a switch.
Feature flags control what code paths are active at a given time. Also known as feature toggles, switchers, or flippers, these flags can be switched on and off — either at build time or at runtime — allowing teams to change the behavior of an application without having to update the code.
Benefits of feature flags
Despite adding a layer of complexity in the codebase, feature flags are powerful when it comes to software delivery:
- Short development cycle: without feature flags, we must hold off deployment of a feature until it's thoroughly tested — a process that can take weeks. With them, we can deploy several times per day, try partially developed features, and get instant feedback.
- Simplified version control: we can do away with long-lived topic branches. Feature flags encourage using trunk-based development. We can merge every day, integrate continuously, minimize merge conflicts, and iterate much more quickly.
- Test in production: new features can be initially enabled only for developers and beta users. No separate testing environment is needed.
- Decouple business and technical decisions: sometimes a feature is ready, but we're quite not ready to publish it. Feature flags allow us to switch it on when it makes most business sense.
- Fine-grained releases: feature flags permit a high level of control to conduct canary or blue-green releases.
Use cases for feature flags
First and foremost, feature flags are used to release new features. In addition to canary launches, you can do cool things like activating features for a season (think Black Friday) and installing per-user or per-region toggles. No other technique offers such a degree of control.
The roadmap for using feature flags is:
- Code: deploy the new feature, which is initially disabled for everyone.
- Test: when the feature is complete-ish, toggle it on for internal testers and developers.
- Canary/Beta: after sufficient testing and enough iterations, toggle it on for beta users or a percentage of the general population.
- Iterate: collect metrics and usage analytics. Gather feedback. Continue iterating.
- Release: finally, toggle the feature on for everyone.
Deployment and release must be separate. We should decide when to make a feature available based on business parameters, not on technical merits. Feature flags are one way to achieve this.
Running experiments with feature flags
Since feature flags allow us to change behavior with such a fine degree of control, it's the go-to method for conducting experiments. We can use feature flags to compare alternative versions of a feature.
Say you want to add a Call to Action and you have two alternatives. One is a short form and a button. The other, a single big round button. Which one gets more clicks?
So, you show half the users one version and half the other. After collecting usage data for a period, you can answer which one is better. One last switch will enable the winner alternative for everyone.
Feature toggles as operational switches
Not all toggles are temporary. Some of them can be permanent. Having a limited or "lite" version can be a life-saver in high-demand periods.
What is more, a flag can act as a kill switch to disable code that is causing a crash. Feature flags give the ops team a quick way of reacting to problems.
Flags as an alternative to branching
You got a great idea for a new feature. What's your first instinct as a developer? To quickly pull the latest revision, create a new branch, and get to work. After about 30 minutes, you have a working prototype. All tests are passing and things are looking good.
Yet, you hesitate to integrate the branch into the main trunk because the feature is incomplete. There's still work to do. So you keep the branch isolated for a few days. What could go wrong, right?
By not merging the change right away, you've missed a vital moment. When the feature is finally ready, you realize the hidden cost of branching — teammates have merged their code in the meantime, the trunk has moved on, and you now have the work of fixing the conflicts ahead of you. You may need to redo some of the work in the best-case scenario. At the very worst, discard some of your changes.
The moral of the story is that you can't wait until a feature is complete to commit it to the main branch because the longer a branch is alive, the higher the chance of a conflict down the road. It's like playing a game; the further you advance without hitting a checkpoint, the more time you'll waste when you die and have to reload.
"If you merge every day, suddenly you never get to the point where you have huge merge conflicts that are hard to resolve." -- Linus Torvalds
Hopefully by now, it's clear why long-lived branches are bad. We must shrink them to the absolute bare minimum — if code is constantly being merged into the main trunk, there are little to no integration conflicts. Ideally, we should be merging about three or four times per hour. At the very least, once per day. That way, you know you're walking on safe ground.
We must be comfortable with committing partial features into the main trunk. Here is where feature flags finally play their role. Feature flags let us share new code to the team while protecting users from viewing incomplete features.
Implementing feature flags
To visualize how feature flags work, let's imagine we are building an e-commerce site. We have a recommendation engine that picks suggestions based on the products users are browsing.
We think we can make a better engine by using machine learning. The hypothesis is that getting better suggestions will result in more sales. But getting the model right takes time, so we're not prepared to release it overnight. We want to have room for experimentation.
So, we deploy both versions of the engine:
// returns the active recommendation engine
function engineFactory(){
let useML = false;
//let useML = true; // UNCOMMENT TO ENABLE NEW ENGINE
if(!useML){ // SINGLE TOGGLE POINT
return classicRecomendationEngine();
}
else{
return MLRecomendationEngine();
}
}
let recommended_products = engineFactory()(viewed_product)
This pattern lets us have the toggle point in a single place. Uncoupling the decision point from the rest of the code is essential; otherwise, we'll end up if-then-else
popping up all over the place.
Comments may work for quick experiments, but it's cumbersome. Can we do better? In its place, we can have a router that knows the state of each feature flag and returns the correct engine. As long as the interface to the engine is standardized, they are interchangeable. This pattern is called branch by abstraction.
import toggleRouter from "toggleRouter"
// toggle router for engine
function engineFactory(){
if(features.isFeatureEnabled("use-machine-learning-engine")) { // TOGGLE ROUTER
return MLRecomendationEngine();
}
else{
return classicRecomendationEngine();
}
}
// instantiate the correct engine and use it
let recommended_products = engineFactory()(request.viewed_product);
Features flags are not limited to true or false. We can have multivalue flags. For example, we can write a low-quality naive engine that uses fewer resources to handle traffic spikes.
The more code paths we have, the more tests we'll need to write. A unit test for the recommendation engine should cover all the supported alternatives.
import toggleRouter from "toggleRouter"
describe("Test recommendation engines", function() {
it("works with classic engine", engine => {
const recommended_products = engineFactory('classic')(current_product);
// check results
});
it("works with ML engine", engine => {
const recommended_products = engineFactory('machine-learning')(current_product);
// check results
});
it("works with ML engine", engine => {
const recommended_products = engineFactory('naive')(current_product);
// check results
});
});
Turning on feature flags
Now that we have our first feature flags coded, it's time to decide how to turn them on. The question boils down to two alternatives: at startup or during runtime.
Configuring flags at startup is the simplest and preferred solution unless you need dynamic toggling. You can store the status of all flags in a config file in the same repository as the project, employ environment variables or command-line switches.
Runtime toggles
Are static flags not flexible enough for you? You need to level up the complexity: you need a feature flag database. This lets you change settings on the fly, audit changes, and log feature utilization. But do not forget that you'll also have to gracefully manage flag status changes.
A somewhat less complex alternative is to control flags by request. That is, activate a feature when a given user logs in, or maybe it's a special day in the year, or when a request includes a special cookie or header. You can build the flag-selecting logic into the application.
Feature flags and CI/CD
Let's look at feature flags from the perspective of CI/CD. When you practice continuous integration, you have to decide how you will test feature flags. There are two ways to go about this:
- Build stage flags.
- Runtime flags.
In build stage flags, the state of all flags is known at build time. Since you define every flag at the beginning of the CI pipeline, you're testing the same build that is eventually deployed.
The flip side is that we have to rebuild and redeploy the application for a feature flag change to take effect.
📙 Stacks such as Docker and Kubernetes let you rollout application updates without interruption. Be sure to check our free ebook: CI/CD for Docker and Kubernetes.
Runtime flags and CI/CD
You must set flags either while deploying or at runtime if you don't do it in the build stage. In both cases, you can't know which features are enabled during the build, so your pipeline may not be testing the same artifact that ships to production.
And because you can't test all possible toggle combinations, it's imperative to have sane defaults (the application should run well without any flags defined) and be smart about testing. You must plan carefully what toggle permutations might clash and write adequate tests.
Recommendations for using feature flags
- Use them with measure. Feature flags can get out of control.
- Don't ever, ever, ever repurpose a flag. Always create a new one for every change or risk making another half a billion-dollar mistake.
- Minimize flag debt. Delete old and unused flags. Don't let dead code stay long.
- Unless you have a good reason not to, keep the feature flag lifespan short (weeks).
- Choose descriptive names for your flags.
New-Feature-2
is NOT a good name. - Adopt trunk-based development and continuous delivery.
- Consider having a way to view the state of all your flags. You can write an admin dashboard or an API for this.
- Track and audit flag usage.
Final thoughts
A feature flag adds to the system's complexity but brings more flexibility. You can ship code more frequently, test on production, and wow your users by revealing the feature at the right moment. Mastering feature flags is almost a requirement for trunk-based development and continuous delivery.
Further reads: