That's it! We are going to migrate our on-premise applications to the public cloud! We start by deploying a first Web application to please the business, we set up auto scaling and backups to satisfy the ops team, we encrypt the storage and the database with a key and a firewall to reassure security team... thus increasing costs significantly compared to what was initially expected by business demand.
Switching to the Public Cloud represents a major change within a company. A phase of adoption and acculturation is necessary for a successful transition to the Cloud at the fairest cost.
Have you also discovered hidden costs in the cloud? Discover how to manage and optimize these costs thanks to different optimization strategies and how to adopt the right behaviors during developments.
Why public cloud ?
There are thousands of articles on the internet talking about "Why move to the Cloud?". Some try to convince you to go, others to dissuade you.
Let's see the primary reasons for using public cloud:
The scalability remains the best reason to use cloud technologies followed by agility and cost reduction.
And during the cloud migration, the biggest challenges for organizations engaged with public cloud will be on security operations.
But once you are in production, the interest for cost becomes on top priority. Cloud cost savings is the top initiative for the 4rd year in a row and it will probably be the same for this new year.
Why ?
Why cloud computing seems to be so expensive for most companies?
Moving your workloads to the cloud (or multiple clouds) is supposed to save you money. So why are your computing expenses piling up?
Cloudcheckr
Primary reasons
There are many reasons for cost overruns:
- What cloud migration strategies have been used ? Re-hosting, re-platforming, re-architecting.
- Have we taken the
Cloud Adoption Framework
into account? - Has there been a misunderstanding of Operating Expense vs Capital Expenditure?
- Were we looking for short-term results or was a long-term vision established?
- No cost-driven design. No cost-driven development.
Let's take a fictitious use case. zShop
is a e-commerce enterprise specialized in online sales. Last month, zShop
has finished their migration to the Cloud. Just like 74% of companies optimizing cost has become the top challenge. A new activity has been introduced with this migration Cost Management
[1].
Everyone in that company has to take part in this new activity and it takes time!
Before that, they made more interesting things related to their area of activity.
Those who are most affected by this new activity are the architects and developers who must continuously optimize costs!
Alice đđ»ââïž a Cloud Architect has tasked by the CEO to review the company's cloud architectures.
Before diving into the source code and existing architectures, Alice first wants to see if they could have done better during the migration.
Cost Driven Design
Cost driven design is a common name used by most manufacturing and industries to design their product according to cost & usage. The approach can also be used in a context of a cloud.
Applying cost driven design rules is a big challenge because you need to consider:
- Business objectives
- High availability
- Maintainability
- Cost effectiveness
âĄïž These goals often compete!
It became immediately apparent for Alice that these considerations were ignored during the migration.
There are many tools in the market that can help understand the existing cost & usage like Embotics, Flexera, Cloudcheckr, Cloudability.
To avoid adding more cost to the company by subscribing to a SaaS solution, she resolved to try to understand cloud billing.
Cloud cost & usage
2 resources charge types are presented by cloud providers: Time based
and Consumption based
:
Time based vs consumption based
Time based
- Execution time
Consumption based
- CPU Utilization
- Memory Utilization
- Disk I/O
- Data Transfer
- Requests
- Data Stored
- Total Cost $
Sometimes you are billed on both!
Let's see an example with Netflix [2]:
In Netflix, Stranger Things season 2 is shot in 8K and has nine episodes.
The source files are terabytes of data.
Each season required 190,000 CPU hours to encode.
Thatâs the equivalent of running 2,965 m4.16xlarge instances for an hour.
The pricing of m4.16xlarge id $3.2 per hour so it costs a total of $9488 per hour to Netflix.
What if Netflix had chosen a cloud other than AWS
?
As we can notice, a strict comparison between cloud is not relevant. Depending on the instance type the price changes.
Alice realized that billing service helps to understand the cloud services cost and not the business applications cost.
In a business application point of view, there is two different costs: Mandatory cost and operational cost.
Mandatory cost vs operational cost
Mandatory costs are all costs that you are aware of in advance. By creating a new instance with a specific instance type, you know in advance how much you'll be charged. It is also called acquisition cost or visible costs.
- instance hours
- instance type
- volume type
- âŠ
The following diagram illustrates the mandatory cost. Alice notices that if there is no network traffic, we will only be billed on the allocated resource. She also notices that in this scenario, using serverless applications saves money.
Operational costs are the costs incurred during the operation of the application. You won't know in advance how many user requests the app will receive or how much data you will store tomorrow. It is also called the hidden costs.
- storage
- data transfer
- disk I/O
- requests
- ...
The following diagram illustrates the operational cost. This time Alice notices that if there is network traffic, we will be billed on the allocated resource but also on serverless resources. A network traffic could also be charged depending of the network topology.
As z-shop
has internet facing applications, Alice understood that operational cost could be the highest cost.
Different types of cost-overspending
The cost & usage differs from one business strategy to another. There are:
- Storage: Business depending on applications that store massive datas like satellite images, logs, data lake, etc.
- Compute: Business depending on applications that use intensive computing like SAP, elasticsearch, graph databases, big data processing, building machine learning model, etc.
- Disk: Business depending on applications that use disk I/O at high level like unmanaged databases or unmanaged NFS.
- Fully managed services: Business depending on applications that use fully managed services like AI, videos, iOT or data analytics services.
- Network: Business depending on applications that have public APIs, are internet facing, use inter regional data transfer or communicate with external workloads (e.g: Hybrid Cloud).
So Alice made some investigations to track those data transfer cost. [1]
She noticed that there is no common pattern or best practices to control network traffic unless you use standard services like Amazon CloudWatch to create Alarms based on egress data.
The best way to optimize existing workload can start by adopting a cloud culture that help to avoid wasting resources.
"35 percent of cloud bills are wasted due to inefficiencies"
â RightScale âState of the Cloudâ 2018
Cost Culture
Culture consists of the attitudes and conducts that define how a business operates. [3]
Each employee in the organisation should endorse a culture and sees the change as normal. Engineering, finance, operations, business development teams and C-level executives need to contemplate cost optimization.
All cloud providers give guidance on cost optimization best practices and lessons learned from their customers.
In AWS, Cost Optimization
is one of the five pillars of Well-Architected Framework.
"A cost-optimized system will fully utilize all resources, achieve an outcome at the lowest possible price point, and meet your functional requirements"
The objective of such framework is helping customer:
Going from...
Pay from what you use
to...
Pay from what you need
So far Alice has learned the following lessons:
- Cost driven design approach has not been adopted.
- The current cloud billing dashboard do not help you to control the cloud cost & usage of the business applications.
- The current culture needs to be improved.
Now Alice can look at the architectures put in place and take a closer look at the cost impact.
The myth
Alice started to evaluate an internal Web application migrated from on-premise following a lift & shift strategy.
At first the developers accessed the app from a public IP address, the architecture looked like this.
After some testing, they estimated that the application could be deployed in a staging environment to be tested by the sales team. The only thing to add is a domain name with a DNS record pointing to the public IP address of the instance.
Before deploying in staging environment, Alice remembered that a colleague from an Ops team with a long beard and hair stepped in.
"You cannot deploy in staging environment without autoscaling and a load balancer enabled!".
He was shortly joined by a bald security colleague with round black glasses.
"You cannot deploy your instance in a public subnet. It must be in a private subnet. The Web application must be secured with SSL certificates and user authentication!"
After having deployed all these new resources in the infrastructure and having received the business validation, the developer team wanted to deploy in production. Once again, the security and ops team said: âWait !!â.
Security team: "You cannot deploy to production if you do not encrypt the data and save the database passwords in a secret database!"
Ops team: "You cannot deploy to production if you do not create backups for your persistent disks!".
The network team that we haven't seen from the start appears with red caps and says, "No production possible if a CDN is not in front of your Web application AND a WAF attached to the CDN and the load balancer with firewall rules allowing only our network IP ranges".
The development team started by using 4 services to perform the migration and ended up with 14 services!
Alice noticed that 40% of the cost impact was unrelated to business need.
So Alice wanted to analyze a fully Serverless Internet application to see if there had been a difference.
Also this time, the security, ops and network teams have asked to deploy the same services. So same impact on costs!
But this time the cost impact on business applications was different using Serverless services.
Alice concluded that the estimate of the cloud migration made before the migration based on the on-premise infrastructure resource usage was not sufficient at all. The reality was different.
Alice wrote a report and sent it to the corporation.
The reality
In the public cloud, there will be necessary costs for security & compliance reasons (e.g. Firewall, auditing, encryption, 3rd-party)
and disaster recovery (e.g. Backup, snapshots, replications (Fault Tolerance), redundancy)
Necessary costs for devops
Unpredictable costs on network
- Data transfer egress, cross-regions, cross-accounts, cross-platforms.
Unpredictable costs due to human errors
- Infinite loop, bad architecture, accidents
Unpredictable costs during application operations
- Requests, data stored, data retrievals, events, throughput, disk I/O...
Unpredictable costs incurred by 3rd party tools, that we cannot optimize
- Open source, docker images, commercial licence...
But we will have estimable costs like mandatory costs for compute, block storage, allocation, etc.
And estimable costs from cloud SaaS solutions (zero code) for AI, iOT, Data Analytics, Media, Mobile, Game, etc.
Save money & time
What I always recommend to customers is using:
- Managed services as possible
- Dedicated network connections
The right architecture for the right business need.
Content delivery networks
- Storage lifecycle
- Cloud discounts
- Infrastructure as code
So
- Use Serverless as possible for business application. Otherwise use PaaS services.
- Use managed services as possible for non-business tools like databases, load balancers, Data/Server migration, messaging, building ML models, data analytics, Big data processing, etc. Otherwise use container managed services.
Unless you are looking for portability [4]:
âIf you believe in the cloud, you can't be agnostic.â
Final words
The business creates the cost, the network propagates the cost.
If we had a cost view like this for our business applications, our lives would have been easier.
I wrote an article for the JDN about this concept but in French.
Cloud : le métier crée le coût, le réseau le propage
If you have any questions or feedback, please feel free to leave a comment.
Otherwise, I hope I have helped you answer some of the hard questions about cloud cost.
By the way, do not hesitate to share with peers đ
Thanks for reading!
Documentation:
[1] https://www.zdnet.com/article/cloud-customers-pairing-aws-microsoft-azure-more-according-to-kentik/
[2] https://www.simform.com/compute-pricing-comparison-aws-azure-googlecloud/
[3] https://medium.com/faun/cloud-cost-optimization-add-value-save-money-5dc4a16b08fe
[4] https://www.contino.io/insights/true-cost-being-cloud-agnostic
[5] https://cloudcheckr.com/cloud-cost-management/3-ways-to-reduce-cloud-costs/