One of the top challenges that organizations face when adopting Cloud is cloud cost management. The move from CapEx to OpEx, introducing challenges on both financial teams and technical teams to adapt their work methodology to be efficient in a Cloud environment. This change of mind is called FinOps. In a simple sentence : FinOps is the DevOps for Fin. This is as much a culture to adopt with practices and methodologies than a new job that needs sponsorship and skills.
FinOps principles and organization
The FinOps foundation defines FinOps as
“FinOps is an evolving cloud financial management discipline and cultural practice that enables organizations to get maximum business value by helping engineering, finance, technology and business teams to collaborate on data-driven spending decisions.”
We note in this definition something very important: FinOps is not dealing only with cloud cost but with business value, in fact the profitability is more convenient. This does not matter if cloud costs surge in so far as the business surges too. Actually, the purpose of FinOps is more to increase margins by decreasing Cloud costs per client. For example, increasing Cloud cost by 20% due to a new client on your solution does not matter if the client increases your benefits more than this 20%.
The FinOps activities are in the middle of Financial and Operational (thanks captain obvious). This means that technical teams (Dev, DevOps, Ops) and financial teams need to be able to elevate their skills to talk to each other. On the one hand, technical teams need to be able to talk about cloud cost forecast, discount on commitment, time & usage optimizations, rightsizing and on the other hand, financial teams need to be able to talk Cloud bill anatomy, budgets and alerting, non fixed bill per month and Agile. This can lead to a cultural shock if none of these teams has a dedicated vocabulary to communicate and processes to collaborate.
To be able to establish these practices, we introduce a dedicated team, the FinOps team who is responsible for teams collaboration and transverse FinOps actions (global commitments for example). In small organizations, only one person, even a CTO/CFO or a project manager can compose this team. It is very important that this team is not responsible for every FinOps actions in the company. Like DevOps, FinOps is more a cultural approach than a job by itself. This means that centralization can be applied to specific tasks but the main purpose of this team is to decentralize to infuse FinOps on the technical and financial teams.
FinOps methodologies
FinOps are driven by two key methodologies :
- Phases: to establish a Agile cycle to iterate with the teams closest to product development
- Maturity model: to go step by step and avoid risk to do it badly
Phases: Inform, Optimize, Operate
Like Agile with their ceremonials, FinOps proposes an iterative canvas to achieve continuous activities and improvements. The FinOps phases are cutted in 3 different parts:
- Inform: Where everything begins! This first phase is the way to empower people on FinOps by giving them visibility on their cloud cost. This means that the company needs to establish a cost allocation policy, budgets and forecast with visibility to stakeholders
- Optimize: The billing impact! This phase is the one where cost decisions are taken. Do we have the opportunity to commit to a cloud provider’s resource ? Do we have the opportunity to rightsize a resource ? Do we have enough automation to scale/unscale or shutdown unused or useless Cloud resources ? This phases has to lead to actions to reduce the global cost of your Cloud usage in a centralized way (FinOps teams take global commitment for example) and in a decentralized way (a Tech team rightsizes a resource on their project)
- Operate: Crossing Cloud cost with the business. This last phase is the step that allows you to take a step back between the cloud costs by themselves and how they serve business cases. This step needs to identify the business impact in global, the gained margin, the user satisfaction vs cost with metrics such as latency, SLA projected on the billing. This is also the opportunity to define new business cases and project them on cloud cost to identify at the engineering stages if this business case is sustainable or not.
Maturity model: Crawl, walk, run
The “Crawl, Walk, Run” is a metaphoric way to pitch how to start small and scale iteratively to gain confidence in our FinOps actions. The purpose of this maturity model is to not be frozen by the fear to not doing well and take uncoverable risks on the FinOps activities.
The first step “Crawl” is the kick off to begin FinOps with little reporting and tooling, basic KPIs measurement, basic processes to follow and capacity to understand built on some parts of the teams. It will lead to concrete actions like being able to allocate costs on 50% of your cloud resources or commit on 50% on your cloud resources usage.
The second step “Walk” is the intermediate step with automation and anchored processes to follow, starting to tackle edge cases by identifying them and estimating how to resolve them. It will lead to concrete actions like being able to allocate costs on 80% of your cloud resources, commit on 70% on your cloud resources usage or have a 15% variance on cost forecasting.
The last step “Run” is the gold score to achieve with automation anywhere it is possible, difficult edge cases tackled and global FinOps culture adopted and shared by everyone. It will lead to concrete actions like being able to allocate costs more than 90% of your cloud resources, commit on 80% of your cloud resources usage or have a 12% variance on cost forecasting.
This last step climbed does not mean that FinOps activities are over. This just means that your organization is proficient on it and you need to keep going on what you achieved with the continuous phases Inform, Optimize, Operate.
Let's start on FinOps!
In the latest sections, we describe general canevas to embrace FinOps in your organization. Here, we propose a concrete roadmap to start right now these activities:
1. Labeling, labeling, labeling
There is no allocation without labeling your Cloud resources. The labeling is the keystone to aggregate billing traces and allocate them to projects, teams, cost centers, clients, …
To drive the labeling, you need to identify the dimension of your billing that you want to explore.
- Is it something that you want to allocate based on projects ?
- Do you have projects which hold shared resources (backend services, network, SaaS, …) ?
- Do you want to allocate these shared resources based on the teams’ usage or as a whole ?
- Do you have very different groups of solutions hosted on Cloud (internal software vs client, App for business A vs app for business B) ?
- Do your costs need to be reported only to tech ? to procurement also ? to C-Level ?
All these kinds of questions have to identify the label taxonomy to set up in order to explore cost data in several dimensions.
2. Allocate your costs and show back to projects
From these labels, you will be able to define cost allocation and metrics/dashboards to follow your Cloud costs. They can be only Cloud cost metrics such as monthly comparison, forecasting month after month, annual consolidation, discounts gained but also business metrics such as cost per client, margin on the sales vs cloud cost, client satisfaction against cost evolution, cost per 9 of SLA, …
There are several ways to help in the metrics and dashboards building. The hyperscalers (AWS, Azure and GCP) propose managed databases where you can export billing data, exploit them with triggers to alert people on cost surge, with connector to data visualization solutions or event data enrichment to deep dive on these costs.
All these dashboards need to be user centric and adapt to every persona impacted by FinOps activities. For example, to a Site Reliability Engineer you will need to show a bill against the number of incidents, the availability zone strategy, the scalability factors and user traffic on the applications. You need to show back their Cloud usage cost to empower them in reducing these costs. After this first step is reached, you can also imagine a chargeback mechanism to make them responsive to their own Cloud budgets. To gain this autonomy, cost allocation needs to be as complete as possible (more than 90% of your costs). This level of maturity indicates that you really well understand your costs and also that you can report them internally.
3. Commitment by the transverse FinOps team
A complete agnostic Cloud cost optimization can be taken by the transverse FinOps team himself without other technical teams. The cloud provider commitments allow organizations to take a 1 or 3 years commitment based on quantity of usage of a resource. This quantity of usage can be a number of vCPU, a number of minutes for a virtual machine, a number of queries, a number Gb of outbound traffic. These commitments are specific mechanisms exposed by each hyperscaler with their own approaches and publicly available. For example, for virtual machines, on Google Cloud, you commit on RAM and vCPU behind a flavor and on AWS you commit on the flavor itself. You can also negotiate with the Cloud provider a specific volume commitment on a resource like outbound traffic or even on the global annual bill.
These commitments are not very technical issues, only how your company can project itself in the cloud provider in 1 or 3 year and how the company is mature in the usage of a cloud provider services (for example the VM flavors that we use, are they stable enough in your infrastructure to commit). This is why these commitments are taken with the FinOps team and without touching anything to projects. They are strong levers to gain between 20% and 70% on the cost of committed resources. The counterpart is that you need to be predictable in your costs or take into account the cost variability on resources impacted by the commitments.
These commitments have to follow our Cloud usage evolution to be relevant. In fact, taking the last billing and extracting the exact resource consumption to commit annually on the complete volume is a very bad idea. If you do not use the global volume that you commited, this is all the same, you pay for the resource committed! A good practice is to start with a commitment on a baseline near to 60% of your usage and iteratively revise it with new commitments. With ease, you will take advantage of these commitments by playing with risk to not use them and gain provided by them. Actually, there is a pivot point where the global gain of the commitment covers the risk of not using them all. It can be productive to commit à 90% of usage but knowing that you will cover between 80% and 100% of it during the period of commitment. You will gain less than exactly commit the right volume by you will gain more than commit at 80% and use 85% for example.
The last question to ask yourself on commitment is how they are propagated internally in your company?
- Do you share the bill reduction to teams impacted by them?
- Only show them to demonstrate the work of the FinOps teams?
- Do you reinject these reductions for proofs of concept, R&D budget ? Auto-finance the FinOps team?
There is no standard way to do it, this is highly contextual to the practices in your company.
4. Time and size optimizations with the teams
When we talk about optimizing the cost of a project, we need also to zoom in the project itself and not only on the financial aspect. The main questions to ask to yourself in this context are:
- Are we efficient in our Cloud usage to identify sources of optimizations ?
- Do we rightsize our instances to pay the right price regarding the performance?
- Can we minimize the call to managed services by batching them (logs aggregation, blocks of queries, …)?
- Have we defined scaling policing to cap them to avoid spikes on them?
- Do we have storage duration policies on data in buckets, logs, backup, to cap storage price in the time?
All these questions need to be addressed between the FinOps team and the technical team involved in the project to help them to address FinOps in their daily work. The answers have to be very close to project usage too because optimizations decisions can have an impact on business. For example, if a SLA is insured in the project, if we define scaling policies, are we sure that we can reach the SLA ? If we have user data in a bucket, when we are talking about data durability, are we sure that we do not break a user contract or even a regulation?
5. Empower the teams in time
The last pillar to me is to develop capability for each stakeholder to feel empowered on FinOps. As a FinOps team, give them responsibility such as by configuring budgets and alerting on their projects scope, define with them targets of cost optimization, support them on FinOps activities. The main difficult aspect of these tasks is to keep going on these activities. Such as technical debt, there is a period of time more or less focused on them but at the end, it has to be something recurrent.
To reach this objective, tool yourself to gain enriched alerting on the right people, in the right place, at the right time. For example, Spendlr can use your billing export on GCP to notify users in Slack with customized notifications on Slack channels. The project is open source and connectors can be added here following your needs (AWS, Teams, …)
Tomorrow, I am starting!
In this article, we try to give the foundation of FinOps practices and the first steps to do in order to start these practices. The 5 pillars given are not so simple to implement but need to be the target. The very first steps are simple:
- Assess what you already have in your company. Maybe you already have people trained on FinOps, maybe you have some FinOps initiatives in a team, onboard them on this challenge!
- Go in your company billing console, assess what you pay, the already constructed report, the budgets and alerts setup, the proposed commitments by the cloud provider
- Define the first pillar that you want to adress (spoiler alert: the presenting order is, from my point of view, the priority to give to them)
- Do not hesitate to ask consulting companies audits on your practices to gain feedbacks on what you have done and the next steps.