Kubernetes Cost Management and Analysis Guide

Tony Chan - Sep 22 '21 - - Dev Community

The popularity of Kubernetes is constantly increasing, with more and more companies moving their workload to this way of orchestration. Some organizations exclusively develop new applications on Kubernetes, taking advantage of the architecture designs it enables. Other organizations move their current infrastructure to Kubernetes in a lift-and-shift manner. While some tools offer native solutions to cost analysis, these can quickly become too simple of an overview.

Having your workload running in Kubernetes can bring lots of benefits, but costs become difficult to manage and monitor. In this article, we’ll examine the key reasons why cost can be so difficult to manage in Kubernetes. Plus, you’ll gain insight into how you can improve your cost management significantly.

Traditional vs. Kubernetes Resource Management

Architecture Overview

Before diving into cost management, it's important to first understand how the underlying resources differ. We’ll use the simple webshop above as an example. This webshop contains three distinct components: a frontend service, a cart service, and a product service. The frontend service is responsible for serving everything visually. The cart service is responsible for saving a customer's order in the database. Lastly, the product service is an API that other services, like the frontend, can query in order to get product information. An actual webshop will naturally be more complicated, but we'll stick with this as an example.

Traditional Architecture

Traditionally you would spin up each service on their own pools of VMs, giving these pools the appropriate sizes. This makes it easy to see the cost of each service; you just need to look at the bill. For example, you can quickly figure out the product service is taking up a lot of resources, which you can then start looking into.

Since traditional architecture has been around for so long, many tools—especially cloud providers—are used to reporting costs this way. This isn't the case for Kubernetes.

Kubernetes Architecture

It's possible to re-create the traditional architecture in Kubernetes with a dedicated node pool for each service, but this isn’t the best practice. Ideally, you should use a single or a few pools to host your applications, meaning the three distinct services can run on the same set of nodes. Because of this, your bill can’t tell you what service is taking up what amount of resources.

Kubernetes does provide you with standard metrics like CPU and RAM usage per application, but it’s still tough to decipher not only what is costing you a lot, but specifically how you can lower costs. Given Kubernetes’ various capabilities, many strategies can be implemented to lower costs.

Strategies can involve rightsizing nodes, which isn't too different from a traditional architecture, but Kubernetes offers something new. Kubernetes lets you rightsize Pods. Using limits and requests, as well as specifying the right size of your nodes, you can make sure Pods are efficiently stacked on your nodes for optimal utilization.

Comparing Architectures

While Kubernetes offers many advantages over a traditional architecture, moving your workload to this orchestrator does present challenges. Kubernetes requires extra focus on cost; it won’t be possible to simply look at the bill and know what resources are costing a lot.

With Kubernetes, you should look into using specialized tools for cost reporting. Many of these tools include recommendations on how to lower your cost, which is especially useful. Let's take a deeper dive into how you would manage cost in a Kubernetes setup.

Managing Kubernetes Costs

Managing costs in Kubernetes is not a one-and-done process. There are a number of pitfalls that, if overlooked, could result in businesses experiencing higher costs than what they may have predicted. Let’s talk about some areas where you should be on the lookout for opportunities to mitigate costs.

Kubernetes Workload Considerations

First, understand the nature of your application and how it translates to a cluster environment. Does your application consist of long-lived operations, batch operations that get triggered, are they stateful (ie, databases) or are they stateless?

The answers to these questions should inform the decision-making process around what Kubernetes objects need to be created. Ensuring that your environment only runs the necessary resources is a key step to cost optimization.

Kubernetes Workload Resource Management

Once you have a clear picture of your resources, you can set some limits and configure features like Horizontal Pod Autoscaling (HPA) to scale pods up and down based on utilization. HPAs can be configured to operate based on metrics like CPU and memory out of the box, and can be additionally configured to operate on custom metrics. As you analyze your workload, you can further modify the settings that determine the behavior of your resources.

Kubernetes Infrastructure Resource Management

Managing Kubernetes costs around infrastructure can be especially tricky as you try to figure out the right type of nodes to support your workloads. Your node types will depend on the applications, their resource requirements, and factors related to scaling.

Operators can configure monitoring and alerts to keep track of how nodes are coping and what occurrences in your workload may be triggering scaling events. These kinds of activities can help organizations save costs related to overprovisioning by leveraging scaling features and tools like Cluster Autoscaler to scale nodes when necessary.

Leveraging Observability

In the same vein as the previous point, your organization can make more informed decisions regarding your Kubernetes cluster size and node types by monitoring custom application metrics (ie, requests per second) along with CPU, memory, network, and storage utilization by Pods.

Optimizing Kubernetes Cost with Monitoring

One of the main ways to optimize the costs associated with running Kubernetes clusters is to set up the correct tooling for monitoring. You’ll also need to know how to react to the information you receive, and make sure it’s given to you effectively.

Barometer is coming soon to CloudForecast, which will be helpful for use cases like this.

Monitoring Kubernetes Cluster Cost

The first things you need to monitor in your Kubernetes Cluster are CPU and memory usage. These metrics give you a quick overview of how many resources your Kubernetes cluster is using. By making sure resources in your Kubernetes cluster are correctly tagged using labels or namespaces, you’ll quickly learn what services are costing the most in your organization.

The easiest way to monitor these metrics is via automated reporting. CloudForecast’s upcoming tool will be able to consolidate these reports and deliver them to your team by email or Slack. This ensures each team is aware of how their services are performing, and whether they're using up too many costly resources.

Setting up a general overview is highly recommended. Additionally, you should also ensure you get notified if something out of the ordinary happens. For example, you’ll want to be notified if the product service suddenly starts costing a lot more; this allows you to troubleshoot why and work on fixes.

Kubernetes comes with various different metrics you can use to determine the cost of a specific service. Using the /metrics endpoint provided by the Kubernetes API, you can get a view into pod_cpu_utilization and pod_memory_utilization. With these metrics it becomes easier to see what workloads are drawing what costs. Tools like CloudForecast’s Barometer use these metrics to calculate how many dollars every pod is spending. Having this overview and g_tting a baseline cost of your_ Kubernetes cluster, will help you know when costs are rising too rapidly, and exactly where it’s happening. Knowing how cAdvisor works with Prometheus, and the metrics they collectively expose is incredibly valuable when you want to examine your clusters.

While there are many metrics that can be analyzed, RAM and CPU are typically the ones you want to focus on, as these are the ones that drive your provider to allocate more resources. You can think of RAM and CPU metrics as the symptoms of your cost. With a proper overview they will allow you to know what workloads are costing you more than normal, and from there you start drilling into the service to figure out why it’s happening.

Acting on Monitoring Data

Once you've been notified of irregularities in your Kubernetes cluster, you need to act. There are many valid strategies for lowering k8s cluster cost. As mentioned earlier, a good first step in Kubernetes is to rightsize your nodes and Pods so they run efficiently. Whatever steps you take to optimize cost, doing it manually is tough.

Tools automatically suggest why your cost is high and how to reduce it. This allows you to quickly implement cost optimizations, but also uncover solutions that otherwise wouldn't have come to mind.

What to Monitor

Tools can help a lot, but they’re wasted without a good foundation. To set up a good foundation, you should determine a set of Key Performance Indicators (KPIs) to monitor. A great KPI example is the number of untagged resources. Having your resources tagged allows your tool reports to be more precise, delivering better optimizations.

You could also monitor the total cost of your untagged resources. This can act as motivation for getting your resources tagged, and remind your team to maintain a good baseline approach when setting up new resources. Tracking your KPIs before and after the introduction of a tool is a great way to determine how much it actually helps. In any case, determining KPIs will make sure you’re on top of what's happening in your Kubernetes cluster.

How to Develop a Unit Cost Calculator

Crucial to understanding your cost is knowing how to use the AWS Pricing Calculator. This helps you compare costs associated with running a self-hosted Kubernetes cluster versus with an Amazon EKS cluster.

The CPU (vCPUs) and Memory (GiB) specified in the following example are just for demonstrative purposes and will vary depending on workload requirements.

AWS EKS Cluster Cost and Pricing Estimation

The following calculations are for a Highly Available (HA) Kubernetes cluster with a control plane managed by AWS (EKS) and three worker nodes with 4 vCPUs and 16 GiB of memory each. The instance type used in this case is a t4g.xlarge Reserved EC2 instance (1 year period). This instance type is automatically generated as a recommendation based on the CPU and memory requirements that are specified.

Unit Calculations

  • EC2 Instance Savings Plans rate for t4g.xlarge in the EU (Ireland) for 1 Year term and No Upfront is 0.0929 USDHours in the commitment: 365 days x 24 hours x 1 year = 8760.0 hours
  • Total Commitment: 0.0929 USD x 8760 hours = 813.8 USD
  • Upfront: No Upfront (0% of 813.804) = 0 USD
  • Hourly cost for EC2 Instance Savings Plans = (Total Commitment - Upfront cost)/Hours in the term: (813.804 - 0)/8760 = 0.0929 USD

Please note that you will pay an hourly commitment for the Savings Plans and your usage will be accrued at a discounted rate against this commitment.

Pricing Calculations

  • 1 Cluster x 0.10 USD per hour x 730 hours per month = 73 USD
  • [worker nodes] 3 instances x 0.0929 USD x 730 hours in month = 203.45 USD (monthly instance savings cost)
  • 30 GB x 0.11 USD x 3 instances = 9.90 USD (EBS Storage Cost)

Monthly Cost: 286.35 USD
Annual Cost: 3,436.20 USD

HA EKS with Reserved Instances

Self-Hosted Kubernetes Cluster Pricing and Cost Estimation

The following calculations are for a custom Highly Available (HA) Kubernetes cluster that is self hosted in AWS, and also consists of three worker nodes with 4 vCPUs and 16 GiB of memory each. Similar to the previous analysis of EKS cluster cost estimations, this analysis will use the same instance type for the same reasons detailed above.

Unit Conversions

  • EC2 Instance Savings Plans rate for t4g.xlarge in the EU (Ireland) for 1 Year term and No Upfront is 0.0929 USD
  • Hours in the commitment: 365 days * 24 hours * 1 year = 8760.0 hours
  • Total Commitment: 0.0929 USD * 8760 hours = 813.8 USD
  • Upfront: No Upfront (0% of 813.804) = 0 USD
  • Hourly cost for EC2 Instance Savings Plans = (Total Commitment - Upfront cost)/Hours in the term: (813.804 - 0)/8760 = 0.0929 USD

Please note that you will pay an hourly commitment for Savings Plans and your usage will be accrued at a discounted rate against this commitment.

Pricing Calculations

  • [control-plane nodes] 3 instances x 0.0929 USD x 730 hours in month = 203.45 USD (monthly instance savings cost)
  • [worker nodes] 3 instances x 0.0929 USD x 730 hours in month = 203.45 USD (monthly instance savings cost)
  • 30 GB x 0.11 USD x 3 instances = 9.90 USD (EBS Storage Cost)

Monthly Cost: 426.70 USD
Annual Cost: 5,120.40 USD

HA Custom Control Plane with Saving Plans

Conclusion

By now you've learned how Kubernetes architecture differs from traditional architecture. You've learned what challenges arise once you start to manage costs in Kubernetes, and how to keep them under control. Features like labeling and namespacing can have a great impact on the traceability of your cost, allowing you to reap the full benefits of a Kubernetes architecture. Also, you’ve learned how using the AWS Pricing Calculator can help you estimate the costs associated with running your workloads on a custom Kubernetes cluster compared to running an EKS cluster.

Using a tool like CloudForecast’s Barometer can greatly improve the tracking of cost in your cluster. Barometer not only offers you an effective general overview, it also gives you actionable cost optimization insights.

This article was originally published on: https://www.cloudforecast.io/blog/kubernetes-cost-management-and-analysis/

. . . . . . . . . . . . . . . .