Kube-fledged: Cache Container Images in Kubernetes

Senthil Raja Chermapandian - Mar 7 '22 - - Dev Community

Author: Senthil Raja Chermapandian is Principal Software Engineer in Ericsson. He is a Certified Kubernetes Administrator (CKA), Maintainer of Open Source project “kube-fledged”, Tech Blogger, Speaker & Organizer of KCD Chennai. He specialises in Machine Learning, Distributed Systems, Edge Computing, Cloud-Native Software Development, WebAssembly, Kubernetes and Google Cloud Platform

“The Peregrine Falcon is renowned for its speed, reaching over 200 mph during its characteristic high-speed dive, making it the fastest bird in the world, as well as the fastest member of the animal kingdom.” (Source: Wikipedia)

Introduction

When a Containerized Application is deployed to a Kubernetes cluster, the K8s control plane schedules the Pod to a worker node in the cluster. The Node agent (Kubelet) running in the worker node, co-ordinates with the container runtime (e.g. containerd) installed in the node, and pulls the necessary container images from the Image Registry. Depending upon the size of the images and the network bandwidth available, it takes time to pull all the images to the node. So, in any containerized application, we should be cognizant of the delay introduced due to fetching the images from the registry. Traditional applications that run as processes (e.g. managed by systemd), however, do not suffer from this delay because all necessary files are already installed in the machine.

Imagine your Containerized Application experiences a sudden surge in traffic, and it needs to immediately scale out horizontally (i.e. additional instances need to be created). If you had configured Horizontal Pod Autoscaler (HPA), K8s control plane creates additional replicas of Pods. However these Pods won’t be available for handling the increased traffic, until the required images are pulled and the containers are up and running. Or assume your Application needs to process high-speed real-time data. Such applications have stringent requirements on how rapidly they can be started-up and scaled, because of the very nature of the purpose it fulfils. In short, there are several use cases where the delay introduced due to pulling the images from the registry is not acceptable. Moreover, the network connectivity between the cluster and the image registry could suffer from poor bandwidth or the connectivity could be totally lost. There are scenarios, especially in Edge computing, where Applications have to gracefully tolerate intermittent network connectivity.

These challenges could be solved by different means. One solution that would immensely help in these scenarios is to have the container images cached directly on the cluster worker nodes, so that Kubelet doesn’t needs to pull these images, but immediately use the images already cached in the nodes. In this blog, I’ll explain how one can use kube-fledged, an open source project, to build and manage a cache of container images in a Kubernetes cluster.

Existing Solutions

Before I introduce you to kube-fledged, let me briefly describe existing solutions to tackle this problem. The widely-used approach is to have a Registry mirror running inside the Cluster. Two widely used solutions are i) in-cluster self hosted registry ii) pull-through cache. In the former solution, a local registry is run within the k8s cluster and it is configured as a mirror registry in the container runtime. Any image pull request is directed to the in-cluster registry. If this fails, the request is directed to the primary registry. In the latter solution, the local registry has caching capabilities. When the image is pulled the first time, it is cached in the local registry. Subsequent requests for the image are served by the local registry.

Drawbacks of existing solutions

  1. Setting up and maintaining the Local registry mirror consumes considerable computational and human resources.
  2. For huge clusters spanning multiple regions, we need to have multiple local registry mirrors. This introduces unnecessary complexities when application instances span multiple regions. You might need to have multiple Deployment manifests each pointing to the local registry mirror of that region.
  3. These approaches don’t fully solve the requirement for achieving rapid starting of a Pod since there is still a notable delay in pulling the image from the local mirror. There are several use cases which cannot tolerate this delay.
  4. Nodes might lose network connectivity to the local registry mirror so the Pod will be stuck until the connectivity is restored.

Overview of kube-fledged

Image description

kube-fledged is a kubernetes add-on or operator for creating and managing a cache of container images directly on the worker nodes of a kubernetes cluster. It allows a user to define a list of images and onto which worker nodes those images should be cached (i.e. pulled). As a result, application pods start almost instantly, since the images need not be pulled from the registry. kube-fledged provides CRUD APIs to manage the lifecycle of the image cache, and supports several configurable parameters in order to customize the functioning as per one’s needs. (URL: https://github.com/senthilrch/kube-fledged)

kube-fledged is designed and built as a general-purpose solution for managing an image cache in Kubernetes. Though the primary use case is to enable rapid Pod start-up and scaling, the solution supports a wide variety of use cases as mentioned below

Use Cases

  • Applications that require rapid start-up. For e.g. an application performing real-time data processing needs to scale rapidly due to a burst in data volume.

  • Serverless Functions since they need to react immediately to incoming events.

  • IoT applications that run on Edge devices, because the network connectivity between the edge device and image registry would be intermittent.

  • If images need to be pulled from a private registry and everyone cannot be granted access to pull images from this registry, then the images can be made available on the nodes of the cluster.

  • If a cluster administrator or operator needs to roll-out upgrades to an application and wants to verify before-hand if the new images can be pulled successfully.

How kube-fledged works

Image description

Kubernetes allows developers to extend the kubernetes api via Custom Resources. kube-fledged defines a custom resource of kind “ImageCache” and implements a custom controller (named kubefledged-controller). kubefledged-controller does the heavy-lifting for managing image cache. Users can use kubectl commands for creation and deletion of ImageCache resources.

kubefledged-controller has a built-in Image Manager routine that is responsible for pulling and deleting images. Images are pulled or deleted using kubernetes jobs. If enabled, image cache is refreshed periodically by the refresh worker. kubefledged-controller updates the status of image pulls, refreshes and image deletions in the status field of ImageCache resource. kubefledged-webhook-server is responsible for validating the fields of the ImageCache resource.

If you need to create an image cache in your cluster, you only need to create an ImageCache manifest by specifying the list of images to be pulled, along with a nodeSelector. The nodeSelector is used to specify the nodes onto which the images should be cached. If you want the images to be cached in all the nodes of the cluster, then omit the nodeSelector. When you submit the manifest to your cluster, the API server will POST a validating webhook event to kubefledged-webhook-server. The webhook server validates the cacheSpec of the manifest. Upon receiving a successful response from the webhook server, API server persists the ImageCache resource in etcd. This triggers an Informer notification to kubefledged-controller, which queues the request. The request is picked up by the Image cache worker, which creates multiple image pull requests (one request per image per node) and places them in the image pull/delete queue. These requests are handled by the image manager routine. For every request, the image manager creates a k8s job that is responsible for pulling the image into the cache. The image manager keeps track of the jobs it creates and once a job completes, it places a response in a separate queue. The image cache worker then aggregates all the results from the image manager and finally updates the status section of the ImageCache resource.

kube-fledged has a refresh worker routine which runs periodically to keep the image cache refreshed. If it discovers that any image is missing in the cache (perhaps removed by kubelet’s image garbage collection), it re-pulls the image into the cache. Images with :latest tag are always re-pulled during the refresh cycle. By default, the refresh cycle is triggered every 5m. Users can modify it to a different value or completely disable the auto-refresh mechanism while deploying kube-fledged. An on-demand refresh mechanism is also supported, using which users can request kube-fledged to refresh the image cache immediately.

Image Cache actions supported by kube-fledged

kube-fledged supports the following image cache actions. All these actions can be performed using kubectl or by directly submitting a REST API request to the Kubernetes API server:

  • Create Image Cache
  • Modify Image Cache
  • Refresh Image Cache
  • Purge Image Cache
  • Delete Image Cache

Supported Container Runtimes

  • docker
  • containerd
  • cri-o

Supported Platforms

  • linux/amd64
  • linux/arm
  • linux/arm64

Try kube-fledged

The quickest way to try out kube-fledged is to deploy it using the YAML manifests in the project’s GitHub Repo (https://github.com/senthilrch/kube-fledged). You could also deploy it using helm chart and helm operator. Find below the steps for deploying kube-fledged using manifests:

  • Clone the source code repository


$ mkdir -p $HOME/src/github.com/senthilrch
$ git clone https://github.com/senthilrch/kube-fledged.git $HOME/src/github.com/senthilrch/kube-fledged
$ cd $HOME/src/github.com/senthilrch/kube-fledged


Enter fullscreen mode Exit fullscreen mode
  • Deploy kube-fledged to the cluster


$ make deploy-using-yaml


Enter fullscreen mode Exit fullscreen mode
  • Verify if kube-fledged deployed successfully


$ kubectl get pods -n kube-fledged -l app=kubefledged
$ kubectl get imagecaches -n kube-fledged (Output should be: 'No resources found')

Enter fullscreen mode Exit fullscreen mode




Similar solutions

Find below a list of similar open source solutions that I have noticed. These solutions try to address the problem using alternate approaches (If you happen to know of other similar solutions, please add a comment to this blog).

Stargz Snapshotter: Fast container image distribution plugin with lazy pulling (URL: https://github.com/containerd/stargz-snapshotter)

Uber Kraken: Kraken is a P2P Docker registry capable of distributing TBs of data in seconds (URL: https://github.com/uber/kraken)

Imagewolf: ImageWolf is a PoC that provides a blazingly fast way to get Docker images loaded onto your cluster, allowing updates to be pushed out quicker (URL: https://github.com/ContainerSolutions/ImageWolf)

Conclusion

There are Applications and Use cases which require rapid start-up and scaling. The delay introduced by pulling images from the registry might not be acceptable in such cases. Moreover the network connectivity to the registry might be unstable/intermittent. And there could be security reasons for not granting access to secure registries to all the users. Kube-fledged can be a simple and useful solution to build and mange a cache of container images directly on the cluster worker nodes.

Join us

Register for Kubernetes Community Days Chennai 2022 at kcdchennai.in

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .