Cloud Cost - Separating Myth from Reality

Chabane R. - Jan 20 '21 - - Dev Community

That's it! We are going to migrate our on-premise applications to the public cloud! We start by deploying a first Web application to please the business, we set up auto scaling and backups to satisfy the ops team, we encrypt the storage and the database with a key and a firewall to reassure security team... thus increasing costs significantly compared to what was initially expected by business demand.

Switching to the Public Cloud represents a major change within a company. A phase of adoption and acculturation is necessary for a successful transition to the Cloud at the fairest cost.

Have you also discovered hidden costs in the cloud? Discover how to manage and optimize these costs thanks to different optimization strategies and how to adopt the right behaviors during developments.

Why public cloud ?

There are thousands of articles on the internet talking about "Why move to the Cloud?". Some try to convince you to go, others to dissuade you.

Let's see the primary reasons for using public cloud:

Alt Text

The scalability remains the best reason to use cloud technologies followed by agility and cost reduction.

And during the cloud migration, the biggest challenges for organizations engaged with public cloud will be on security operations.

Alt Text

But once you are in production, the interest for cost becomes on top priority. Cloud cost savings is the top initiative for the 4rd year in a row and it will probably be the same for this new year.

Alt Text

Why ?

Why cloud computing seems to be so expensive for most companies?

Alt Text

Moving your workloads to the cloud (or multiple clouds) is supposed to save you money. So why are your computing expenses piling up?

Cloudcheckr

Primary reasons

There are many reasons for cost overruns:

  • What cloud migration strategies have been used ? Re-hosting, re-platforming, re-architecting.
  • Have we taken the Cloud Adoption Framework into account?
  • Has there been a misunderstanding of Operating Expense vs Capital Expenditure?
  • Were we looking for short-term results or was a long-term vision established?
  • No cost-driven design. No cost-driven development.

Let's take a fictitious use case. zShop is a e-commerce enterprise specialized in online sales. Last month, zShop has finished their migration to the Cloud. Just like 74% of companies optimizing cost has become the top challenge. A new activity has been introduced with this migration Cost Management [1].

Alt Text

Everyone in that company has to take part in this new activity and it takes time!

Before that, they made more interesting things related to their area of activity.

Alt Text

Those who are most affected by this new activity are the architects and developers who must continuously optimize costs!

Alt Text

Alice đŸ™‹đŸ»â€â™€ïž a Cloud Architect has tasked by the CEO to review the company's cloud architectures.

Before diving into the source code and existing architectures, Alice first wants to see if they could have done better during the migration.

Cost Driven Design

Cost driven design is a common name used by most manufacturing and industries to design their product according to cost & usage. The approach can also be used in a context of a cloud.

Applying cost driven design rules is a big challenge because you need to consider:

  • Business objectives
  • High availability
  • Maintainability
  • Cost effectiveness

âžĄïž These goals often compete!

It became immediately apparent for Alice that these considerations were ignored during the migration.

There are many tools in the market that can help understand the existing cost & usage like Embotics, Flexera, Cloudcheckr, Cloudability.

To avoid adding more cost to the company by subscribing to a SaaS solution, she resolved to try to understand cloud billing.

Alt Text

Cloud cost & usage

2 resources charge types are presented by cloud providers: Time based and Consumption based:

Time based vs consumption based

Time based

  • Execution time

Consumption based

  • CPU Utilization
  • Memory Utilization
  • Disk I/O
  • Data Transfer
  • Requests
  • Data Stored
  • Total Cost $

Sometimes you are billed on both!

Let's see an example with Netflix [2]:

In Netflix, Stranger Things season 2 is shot in 8K and has nine episodes.
The source files are terabytes of data.
Each season required 190,000 CPU hours to encode.
That’s the equivalent of running 2,965 m4.16xlarge instances for an hour.
The pricing of m4.16xlarge id $3.2 per hour so it costs a total of $9488 per hour to Netflix.

What if Netflix had chosen a cloud other than AWS
?

Alt Text

Alt Text

As we can notice, a strict comparison between cloud is not relevant. Depending on the instance type the price changes.

Alice realized that billing service helps to understand the cloud services cost and not the business applications cost.

In a business application point of view, there is two different costs: Mandatory cost and operational cost.

Mandatory cost vs operational cost

Mandatory costs are all costs that you are aware of in advance. By creating a new instance with a specific instance type, you know in advance how much you'll be charged. It is also called acquisition cost or visible costs.

  • instance hours
  • instance type
  • volume type
  • 


The following diagram illustrates the mandatory cost. Alice notices that if there is no network traffic, we will only be billed on the allocated resource. She also notices that in this scenario, using serverless applications saves money.

Alt Text

Operational costs are the costs incurred during the operation of the application. You won't know in advance how many user requests the app will receive or how much data you will store tomorrow. It is also called the hidden costs.

  • storage
  • data transfer
  • disk I/O
  • requests
  • ...

The following diagram illustrates the operational cost. This time Alice notices that if there is network traffic, we will be billed on the allocated resource but also on serverless resources. A network traffic could also be charged depending of the network topology.

Alt Text

As z-shop has internet facing applications, Alice understood that operational cost could be the highest cost.

Different types of cost-overspending

The cost & usage differs from one business strategy to another. There are:

  • Storage: Business depending on applications that store massive datas like satellite images, logs, data lake, etc.
  • Compute: Business depending on applications that use intensive computing like SAP, elasticsearch, graph databases, big data processing, building machine learning model, etc.
  • Disk: Business depending on applications that use disk I/O at high level like unmanaged databases or unmanaged NFS.
  • Fully managed services: Business depending on applications that use fully managed services like AI, videos, iOT or data analytics services.
  • Network: Business depending on applications that have public APIs, are internet facing, use inter regional data transfer or communicate with external workloads (e.g: Hybrid Cloud).

So Alice made some investigations to track those data transfer cost. [1]

Alt Text

She noticed that there is no common pattern or best practices to control network traffic unless you use standard services like Amazon CloudWatch to create Alarms based on egress data.

The best way to optimize existing workload can start by adopting a cloud culture that help to avoid wasting resources.

Alt Text

"35 percent of cloud bills are wasted due to inefficiencies"
– RightScale “State of the Cloud” 2018

Cost Culture

Culture consists of the attitudes and conducts that define how a business operates. [3]

Each employee in the organisation should endorse a culture and sees the change as normal. Engineering, finance, operations, business development teams and C-level executives need to contemplate cost optimization.

All cloud providers give guidance on cost optimization best practices and lessons learned from their customers.

In AWS, Cost Optimization is one of the five pillars of Well-Architected Framework.

"A cost-optimized system will fully utilize all resources, achieve an outcome at the lowest possible price point, and meet your functional requirements"

The objective of such framework is helping customer:

Going from...

Pay from what you use

to...

Pay from what you need

Alt Text

So far Alice has learned the following lessons:

  • Cost driven design approach has not been adopted.
  • The current cloud billing dashboard do not help you to control the cloud cost & usage of the business applications.
  • The current culture needs to be improved.

Now Alice can look at the architectures put in place and take a closer look at the cost impact.

The myth

Alice started to evaluate an internal Web application migrated from on-premise following a lift & shift strategy.

At first the developers accessed the app from a public IP address, the architecture looked like this.

Alt Text

After some testing, they estimated that the application could be deployed in a staging environment to be tested by the sales team. The only thing to add is a domain name with a DNS record pointing to the public IP address of the instance.

Before deploying in staging environment, Alice remembered that a colleague from an Ops team with a long beard and hair stepped in.

"You cannot deploy in staging environment without autoscaling and a load balancer enabled!".

He was shortly joined by a bald security colleague with round black glasses.

"You cannot deploy your instance in a public subnet. It must be in a private subnet. The Web application must be secured with SSL certificates and user authentication!"

Alt Text

After having deployed all these new resources in the infrastructure and having received the business validation, the developer team wanted to deploy in production. Once again, the security and ops team said: “Wait !!”.

Security team: "You cannot deploy to production if you do not encrypt the data and save the database passwords in a secret database!"

Ops team: "You cannot deploy to production if you do not create backups for your persistent disks!".

The network team that we haven't seen from the start appears with red caps and says, "No production possible if a CDN is not in front of your Web application AND a WAF attached to the CDN and the load balancer with firewall rules allowing only our network IP ranges".

The development team started by using 4 services to perform the migration and ended up with 14 services!

Alt Text

Alice noticed that 40% of the cost impact was unrelated to business need.

So Alice wanted to analyze a fully Serverless Internet application to see if there had been a difference.

Alt Text

Also this time, the security, ops and network teams have asked to deploy the same services. So same impact on costs!

But this time the cost impact on business applications was different using Serverless services.

Alt Text

Alice concluded that the estimate of the cloud migration made before the migration based on the on-premise infrastructure resource usage was not sufficient at all. The reality was different.

Alt Text

Alice wrote a report and sent it to the corporation.

The reality

In the public cloud, there will be necessary costs for security & compliance reasons (e.g. Firewall, auditing, encryption, 3rd-party)

Alt Text

and disaster recovery (e.g. Backup, snapshots, replications (Fault Tolerance), redundancy)

Alt Text

Necessary costs for devops

Alt Text

Unpredictable costs on network

  • Data transfer egress, cross-regions, cross-accounts, cross-platforms.

Alt Text

Unpredictable costs due to human errors

  • Infinite loop, bad architecture, accidents

Alt Text

Unpredictable costs during application operations

  • Requests, data stored, data retrievals, events, throughput, disk I/O...

Unpredictable costs incurred by 3rd party tools, that we cannot optimize

  • Open source, docker images, commercial licence...

But we will have estimable costs like mandatory costs for compute, block storage, allocation, etc.

Alt Text

And estimable costs from cloud SaaS solutions (zero code) for AI, iOT, Data Analytics, Media, Mobile, Game, etc.

Alt Text

Save money & time

What I always recommend to customers is using:

  • Managed services as possible

Alt Text

  • Dedicated network connections

Alt Text

  • The right architecture for the right business need.

  • Content delivery networks

Alt Text

  • Storage lifecycle

Alt Text

  • Cloud discounts

Alt Text

  • Infrastructure as code

Alt Text

So

Alt Text

  • Use Serverless as possible for business application. Otherwise use PaaS services.
  • Use managed services as possible for non-business tools like databases, load balancers, Data/Server migration, messaging, building ML models, data analytics, Big data processing, etc. Otherwise use container managed services.

Unless you are looking for portability [4]:

“If you believe in the cloud, you can't be agnostic.”

Alt Text

Final words

The business creates the cost, the network propagates the cost.

Alt Text

If we had a cost view like this for our business applications, our lives would have been easier.

I wrote an article for the JDN about this concept but in French.

Cloud : le métier crée le coût, le réseau le propage

If you have any questions or feedback, please feel free to leave a comment.

Otherwise, I hope I have helped you answer some of the hard questions about cloud cost.

By the way, do not hesitate to share with peers 😊

Thanks for reading!

Documentation:

[1] https://www.zdnet.com/article/cloud-customers-pairing-aws-microsoft-azure-more-according-to-kentik/
[2] https://www.simform.com/compute-pricing-comparison-aws-azure-googlecloud/
[3] https://medium.com/faun/cloud-cost-optimization-add-value-save-money-5dc4a16b08fe
[4] https://www.contino.io/insights/true-cost-being-cloud-agnostic
[5] https://cloudcheckr.com/cloud-cost-management/3-ways-to-reduce-cloud-costs/

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .