Author: Senthil Raja Chermapandian is Principal Software Engineer in Ericsson. He is a Certified Kubernetes Administrator (CKA), Maintainer of Open Source project “kube-fledged”, Tech Blogger, Speaker & Organizer of KCD Chennai. He specialises in Machine Learning, Distributed Systems, Edge Computing, Cloud-Native Software Development, WebAssembly, Kubernetes and Google Cloud Platform
Introduction:
Managing Applications in Production has traditionally been a Human-centric affair. Large Ops Teams were tasked with managing the day-to-day operations of the running Apps. These teams had experts with deep domain and functional knowledge of the App and its Infrastructure, and often relied on heroes to save the day when an outage occurred.
When the number of Apps and their complexities started growing, inefficiencies and cost overruns crept in. IT Organizations resorted to Automation Tools to address these challenges. As a result, the Ops team evolved to become more Tool-centric, relying on tools, automation and scripts for tasks like monitoring, alerting, patching, backup/restore etc.
Code-centric App Management:
In due course, Organizations started embracing Cloud-Native Applications and Infrastructure. The fundamental approach to designing and building Apps changed: Cloud, Containers and Micro-services took center-stage and users begun to take elasticity, scalability and resiliency for granted. This trend created new challenges and made way for the wide-spread adoption of DevOps and SRE approaches to elevate efficiencies in Operations. Relying on Humans and Tools alone for managing Cloud-Native Applications won’t help.
The answer is a Code-centric approach to managing a Cloud-Native App in Production. In Code-centric approach, the Ops team transforms itself into a group of Software engineers with a goal to codify all of the domain knowledge and operational tasks required for managing a Cloud-Native App. Code becomes the fundamental asset for managing the App, with Humans and Tools augmenting wherever needed. This codified asset is capable of performing all the operational tasks for managing the App. Let’s call this codified asset as Ops-App. An Ops-App is often built by SREs using the same language, framework and constructs used by the App it manages. The Ops-App handles all the operational activities of the App viz. Deployment, Upgrade, Patching, Backup/Restore, Monitoring, Alerting, Scaling etc.
What is a Kubernetes Operator?
Kubernetes allows us to pool Compute Resources from a group of physical and virtual servers, and makes these resources available to Containerized applications on-demand. It manages the lifecycle of containers, and provides various critical capabilities: automation, self-healing, persistent volumes, RBAC, auto-scaling etc.
A Kubernetes Operator, in simple terms, is an Ops-App for a Containerized Cloud-Native App running in Kubernetes. Kubernetes has inherent support for several basic App management tasks like self-healing, scaling, monitoring etc. A Kubernetes Operator “amplifies” these basic capabilities into a fully blown Ops-App that has the entire domain knowledge of the App embedded in it. It does this, by simply extending or enhancing the capabilities of the Kubernetes cluster that hosts the App (Kubernetes allows users to add new API resources via a feature called Custom Resource Definition (CRD)). A user can then use the native Kubernetes command line tool (kubectl) or APIs to interact with the Ops-App, in the same way the user interacts with the Kubernetes Control plane. And, you get to run both the App and the Ops-App in the same Kubernetes cluster, allowing you to use existing CI/CD pipelines to manage the release of the App and its corresponding Ops-App. In Kubernetes parlance, the Ops-App is called an Operator.
Let’s assume the Dev Team has written a Java Spring boot Application and has Containerized the App into a Container Image. The Ops Team has to deploy this App into a Production Kubernetes Cluster and manage the lifecycle of the App. Rather than deploying this App directly, the Ops team would write an Operator for this App. The Operator “knows” how to programmatically deploy the App into the Cluster with the right configuration, resource requirements etc. The Operator can also be written in a fashion that it knows “much more” than Deploying the App: it knows how to apply patches, roll-out upgrades, back up the App’s data, monitor the metrics of the App, initiate scaling, restore the backup etc. A human being would be required to intervene only for tasks which the Operator cannot perform. As a result, managing an App becomes highly efficient and cost-optimized. This is a powerful pattern for managing Cloud-Native Apps in Kubernetes.
Operator Capability Levels
Operators come in different maturity levels in regards to their lifecycle management capabilities for the application or workload they manage. The capability model aims to provide guidance in terminology to express what features users can expect from an operator.
Each capability level is associated with a certain set of management features the Operator offers around the managed workload. Operator that do not manage a workload and/or are delegating to off-clusters orchestration services would remain at Level 1. Capability levels are accumulating, i.e. Level 3 capabilities require all capabilities desired from Level 1 and 2.
The Operator Framework
A Kubernetes Operator brings much needed efficiencies in managing a Containerized App. Excellent! However, writing an Operator today can be difficult because of challenges such as using low level APIs, writing boilerplate code, and a lack of modularity which leads to duplication. The Operator Framework removes the pain out of writing a Kubernetes Operator. It is an open source toolkit to manage Operators in an effective, automated, and scalable way. It provides a SDK which can be downloaded. Refer to this blog to get in-depth understanding on using the Operator SDK for writing an Operator.
If you are looking for pre-built, readily usable Operators, it’s available in operatorhub.io. This is a collection of Open-sourced Operators for popular Applications. You can either use them as-is or download the source code and modify it as per your use case. As of this writing, the repository has 209 Operators for Apps ranging from Akka to Wildfly. You could submit your Operator to operatorhub.io for others to discover and use it.
Conclusion:
Kubernetes Operators can be very valuable in managing Cloud-Native Applications in Kubernetes. If you could build a group of SREs with the right skillset for writing and maintaining Operators, you’d benefit a lot. I believe this article has been helpful in getting a big picture of Kubernetes Operators and its benefits. Share your feedback in the comments section.
Join us
Register for Kubernetes Community Days Chennai 2022 at kcdchennai.in