In this blog I’ll be talking about setting up GitHub Chaos Action using KinD cluster in your GitHub workflow, to know more about GitHub Chaos Actions and how to get started with chaos actions please refer Part-1 of the action series. This blog deals with automating the workflow by creating a kind cluster inside the CI workflow only in place of connecting an external cluster as we did in Part-1. Before jumping in, let's do a quick recap on Litmus. Litmus is a framework for practicing chaos engineering in cloud-native environments. Litmus provides a chaos operator, a large set of chaos experiments on its hub, detailed documentation, and a friendly community. Litmus is very easy to use; you can also set up a very quick demo environment to install and run Litmus experiments.
Pre-requisites:
GO
Docker
Note:The above is only required when you're not using the default runner otherwise the ubuntu-latest image has all the dependencies installed in it.
Brief Introduction to GitHub Chaos Action
GitHub Chaos Action helps you to automate the chaos testing in a Cloud Native way using Github Actions. It contains a number of litmuschaos experiments which will help to find the weaknesses or improvements required for your application.
We can also say that the Github chaos action is used to create custom software development life cycle (SDLC) workflows directly in your GitHub repository.To know more about setting up a sample workflow using GitHub chaos action please visit the Part-1 of the blog.
Why KinD Cluster?
KinD (Kubernetes in Docker) was primarily designed for testing Kubernetes itself, but now it has become very famous to be used for local development or CI. But “Why KinD?” You might ask this question to yourself while setting up GitHub CI with chaos action to answer this let’s do a small analysis of what is required.
Easily Setup: We mostly use GitHub default runner as it has almost all the capabilities installed in it to run a simple workflow and we don’t need an external runner until or unless we have a larger requirement. The Github Ubuntu image of the default runner comes preinstalled with kind so we just need to spin up a cluster with a configuration file or with default values. In this blog I will be using default values to setup KinD which will create a single node cluster with named ‘kind’
Compatible: We need a cluster which is very lightweight and easy to use. For this purpose, KinD fits the best. KinD is a tool for running local Kubernetes clusters using Docker container “nodes”.
These are enough reasons to use kind in our CI workflows.
New trend in Continuous Integration/Deployment:
One of the new trends in Continuous Integration/Deployment is to:
Create an application image.
Run tests against the created image.
Push image to a remote registry.
Deploy to a server from the pushed image.
It’s also useful when your application already has the Dockerfile that can be used to create and test an image.
We will create a Github CI workflow covering the above stages on a particular PR for that we need to follow the following ten simple steps:
Step-1: Setup a fresh workflow in case you don’t have one
Let’s get started with writing a GitHub CI YAML if you don’t have it already. We are now familiar with the fields and attributes of CI YAML from Part-1, so we will directly move toward steps involved in setting up the actions with the following template:
name:Litmus-CIon:# Trigger the workflow on push or pull request,# but only for the master branchpush:branches:-masterjobs:# Job namechoas-tests:runs-on:ubuntu-lateststeps:
Step-2: Checkout the latest commit on the Pull Request
When you commit code to your repository, you can continuously build and test the code to make sure that the commit doesn't introduce errors. The error could be in the form of some security issue, functional issue or performance issue which can be tested using different custom tests, linters or by pulling actions. This brings the need of having Chaos Actions which will perform a chaos test on the application over a particular commit which in-turn helps to track the performance of the application on a commit level. The following lines of code will help you checkout on the latest commit on the PR:
#Using the last commit id of pull request-uses:octokit/request-action@v2.xid:get_PR_commitswith:route:GET /repos/:repo/pulls/:pull_number/commitsrepo:${{ github.repository }}pull_number:${{ github.event.issue.number }}env:GITHUB_TOKEN:${{ secrets.GITHUB_TOKEN }}-name:set commit to outputid:getcommitrun:|prsha=$(echo $response | jq '.[-1].sha' | tr -d '"')echo "::set-output name=sha::$prsha" env:response:${{ steps.get_PR_commits.outputs.data }}-uses:actions/checkout@v2with:ref:${{steps.getcommit.outputs.sha}}
Just add these lines under steps in list format and we are on the latest commit on the PR.
Step-3: Create an Image of the application
In this stage we will Dockerize the application from the repository. For this we need to have a Dockerfile in the repository which can be used to create the application image. Here, let us suppose that we have a Dockerfile at location /build from root. So for creating a Docker image of this application we need to run following command in CI YAML:
As we discussed for running Chaos Actions we need to have a cluster and we will be creating a KinD cluster in the CI workflow only. As we’re using Ubuntu-latest image which already contain the kind installation we need to spin a cluster. For spinning the latest version of the KinD cluster we just need to run a kind create cluster command or for a specific version we need to add an action for that. For this blog I’ll be using kind 0.7.0 version which can be installed from the following line of code:
#Install and configure a kind cluster-name:Installing Prerequisites (KinD Cluster)uses:engineerd/setup-kind@v0.5.0with:version:"v0.7.0"-name:Configuring and testing the Installationrun:|kubectl cluster-info --context kind-kindkind get kubeconfig --internal >$HOME/.kube/configkubectl get nodes
Step-5: Load the image inside the node of the cluster
After creating the cluster we need to load the application image which was built in step 3 in the node of the cluster to be used locally for testing.
-name:Load image on the nodes of the clusterrun:|kind load docker-image --name=kind <image-name>:<image-tag
Step-6: Run a pod with the application container
Now we will run the application container in a pod. Different chaos testing on the application level and on the node level will be performed and the health of the pod will be verified after inducing chaos. You can refer to a sample pod manifest which can be used to create an application pod from here replace the nginx image with your image and place it at some location (say /path/to/application) in your repository. In this step we will create the application pod via the following command and wait for a few seconds to get it ready.
-name:Deploy a sample application for chaos injectionrun:|kubectl apply -f /path/to/applicationsleep 30
Step-7: Setup kubeconfig ENV for GitHub Actions
Now we need to export KUBE_CONFIG_DATA containing kubeconfig of the the KinD cluster encoded in base64 to be used by Chaos Action.
-name:Setting up kubeconfig ENV for Github Chaos Actionrun:echo ::set-env name=KUBE_CONFIG_DATA::$(base64 -w 0 ~/.kube/config)
Step-8: Run GitHub Chaos Actions on the application
Here comes the step where we will use GitHub Chaos Actions for chaos testing. The experiments and parameters are controlled by actions ENV as explained in the Chaos Action page.
Example:
-name:Running Litmus pod delete chaos experimentif:startsWith(github.event.comment.body, '/run-e2e-pod-delete') || startsWith(github.event.comment.body, '/run-e2e-all')uses:mayadata-io/github-chaos-actions@v0.1.1env:INSTALL_LITMUS:trueEXPERIMENT_NAME:pod-deleteEXPERIMENT_IMAGE:litmuschaos/ansible-runnerEXPERIMENT_IMAGE_TAG:ciIMAGE_PULL_POLICY:IfNotPresentLITMUS_CLEANUP:true
Step-9: Push the application image to the registry
This step will execute only when all other steps run fine. This means our application has passed the chaos test and now we are good to push the image in the registry.
-name:Publish to Docker Repositoryuses:elgohr/Publish-Docker-Github-Action@masterwith:Name:<image-name>username:${{ secrets.DUSER }}password:${{ secrets.DPASS }}dockerfile:build/Dockerfiletags:"<image-tag>"
Step-10: Delete Kind Cluster
This is basically a cleanup step where we remove the cluster so the application pod and chaos components also get removed. This should In case some experiment fails, even then the cleanup stage should execute.
name:Litmus-CIon:push:branches:[master]jobs:tests:runs-on:ubuntu-lateststeps:#Using the last commit id of pull request-uses:octokit/request-action@v2.xid:get_PR_commitswith:route:GET /repos/:repo/pulls/:pull_number/commitsrepo:${{ github.repository }}pull_number:${{ github.event.issue.number }}env:GITHUB_TOKEN:${{ secrets.GITHUB_TOKEN }}-name:set commit to outputid:getcommitrun:|prsha=$(echo $response | jq '.[-1].sha' | tr -d '"')echo "::set-output name=sha::$prsha" env:response:${{ steps.get_PR_commits.outputs.data }}-uses:actions/checkout@v2with:ref:${{steps.getcommit.outputs.sha}}-name:Build docker imagerun:|sudo docker build -f build/ansible-runner/Dockerfile -t litmuschaos/ansible-runner:ci .#Install and configure a kind cluster-name:Installing Prerequisites (KinD Cluster)uses:engineerd/setup-kind@v0.4.0with:version:"v0.7.0"-name:Configuring and testing the Installationrun:|kubectl cluster-info --context kind-kindkind get kubeconfig --internal >$HOME/.kube/configkubectl get nodes -name:Load image on the nodes of the clusterrun:|kind load docker-image --name=kind litmuschaos/ansible-runner:ci-name:Deploy a sample application for chaos injectionrun:|kubectl apply -f https://raw.githubusercontent.com/mayadata-io/chaos-ci-lib/master/app/nginx.ymlsleep 30-name:Setting up kubeconfig ENV for Github Chaos Actionrun:echo ::set-env name=KUBE_CONFIG_DATA::$(base64 -w 0 ~/.kube/config)-name:Running node-memory-hog chaos experimentuses:mayadata-io/github-chaos-actions@v0.1.1env:INSTALL_LITMUS:trueEXPERIMENT_NAME:node-memory-hogEXPERIMENT_IMAGE:litmuschaos/ansible-runnerEXPERIMENT_IMAGE_TAG:ciIMAGE_PULL_POLICY:IfNotPresentLITMUS_CLEANUP:true-name:Publish to Docker Repositoryuses:elgohr/Publish-Docker-Github-Action@masterwith:Name:<image-name>username:${{ secrets.DUSER }}password:${{ secrets.DPASS }}dockerfile:build/Dockerfiletags:"<image-tag>"-name:Deleting KinD clusterif:${{ always() }}run:kind delete cluster
View the Result:
Now to view the result of the run we need to navigate to the main page of the repository. Under your repository name, click Actions.
List of all workflows will appear. Out of these workflows select the latest one to view the logs and get an idea what happen during the execution of the chaos test.
The complete log will look like:
If you want to create and use your own experiment/test in the GitHub Chaos Action then you create it using litmus with the help of cool SDK or for more help you can also raise an issue in litmus for that. The community is extremely active and surely they will get back to you with some good help.
NOTE:If you're new to GitHub Actions and wanted to create a simple GitHub workflow then please also read the Part-1 of the action series.
Conclusion:
By using GitHub Chaos Actions along with KinD in the CI workflow automate, customize, and execute your software development workflows right in your repository. You can use the Chaos Action in a completely customized way to check the performance of your application for every commit coming in the Pull Request. This will help both developers and testers to maintain the quality of the software. So, what do you think? What else can we achieve using Chaos Actions and what other tests can be performed apart from what we have now? Kubernetes experts are welcome to comment and suggest on this!
Are you an SRE or a Kubernetes enthusiast? Does Chaos Engineering excite you?
Join Our Community On Slack For Detailed Discussion, Feedback & Regular Updates On Chaos Engineering For Kubernetes: https://kubernetes.slack.com/messages/CNXNB0ZTN
(#litmus channel on the Kubernetes workspace)
Check out the Litmus Chaos GitHub repo and do share your feedback: https://github.com/litmuschaos/litmus
Submit a pull request if you identify any necessary changes.
Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q
LitmusChaos is an open source Chaos Engineering platform that enables teams to identify weaknesses & potential outages in infrastructures by
inducing chaos tests in a controlled way. Developers & SREs can practice Chaos Engineering with LitmusChaos as it is easy to use, based on modern
Chaos Engineering principles & community collaborated. It is 100% open source & a CNCF project.
LitmusChaos takes a cloud-native approach to create, manage and monitor chaos. The platform itself runs as a set of microservices and uses Kubernetes
custom resources (CRs) to define the chaos intent, as well as the steady state hypothesis.
At a high-level, Litmus comprises of:
Chaos Control Plane: A centralized chaos management tool called chaos-center, which helps construct, schedule and visualize Litmus chaos workflows