Chaos Carnival 2022 was held from January 27-28. Although held as a virtual event, it wasn't just your everyday Chaos Engineering conference but was rather a huge success with the traction it received from the chaos community for both the days. With 30+ Chaos Sessions, 2 Live Panel Discussions, 2 Chaos Workshops, 1000+ attendees inclusive of CEOs, CTOs, VPs, Directors, SREs, DevOps practitioners and many more from the industry, Chaos Carnival 2022 delivered a statement as the flagship chaos engineering conference of the year.
As one of the organizing team members as well as the lead for the LitmusChaos community, it was indeed a pleasure to witness around 10 talks and 2 workshops featuring LitmusChaos, each with unique stories and covering a different aspect of the technology, delivered by stalwart speakers from around the world as part of the speaker roster.
The conference kicked off on the 27th of January at 7:30 AM PST with an amazing keynote by Mikolaj Pawlikowski, followed by the live Chaos Panel discussion hosted by LitmusChaos SIG-Docs lead Divya Mohan featuring community members Laura Henning, Katie Gamanji (Chief of Future Founders, OpenUK), Dushyant Sahni (Global Practice Leader - ISV and Horizontal Tech, Nagarro), Mahesh Venkataraman (Technology Leader - Cloud Advisory, Accenture) and Manivannan Chandrasekaran (DevOps Engineering Manager, HaloDoc) who shared various inputs and stories from their experiences addressing misunderstandings & misconceptions existing in individuals & enterprises while starting off their Chaos Engineering journey, learnings to help the community get started with the practice of Chaos Engineering, understanding use cases, hiring chaos practitioners, automating Chaos Engineering, introducing Chaos Engineering as a practice in large enterprises, identifying the most important metrics for practicing Chaos and adopting Chaos Engineering for legacy applications.
The Live Panel garnered amazing community response and also answered a lot of queries every other community member has before getting started with Chaos Engineering as well as after testing their hands on Chaos.
Check out the full recording for Chaos Panel Discussion:
Further in this blog, we will be covering the various talks that featured the LitmusChaos tool in one way or the other defining various stories and use cases that have transitioned the community as a whole and has brought in the demand for LitmusChaos as one of the go-to toolsets to ensure reliability.
We’ll be sharing thoughts about the talks based on LitmusChaos with a short abstract on what the speaker covered and also the recording that is available on the ChaosNative YouTube channel. I could only watch a small portion of them, but the ones I’m mentioning are well worth watching if you missed them live.
So let's get started...
Day 1
The Freedom of Kubernetes requires Chaos Engineering to shine in production: Henrik Rexed
The first talk of the conference post the keynote and panel discussion was delivered by Henrik Rexed from Dynatrace who has been working on the cloud-native side of things for sometime now and runs the "Is It Observable YouTube channel" containing tutorials on various cloud-native tools. He believes that Kubernetes is an amazing technology, but it also requires a lot of configuration to make sure that our workloads are reliable.
In his session he explains how to use Chaos engineering to improve the reliability of our cluster and how complex it is to measure and validate the impact of our settings on our end-users.
From Monitoring to Observability: Left Shift your SLOs with Chaos: Michael Friedrich
In this talk Michael shares his horror debugging stories at Chaos Carnival slides; situations where he would have loved to have insights before they cause production problems for all teams involved.
Michael also introduces a developer`s view on using cloud-native resources and mistakes turned into visible failures. He takes us through the first steps with metrics, SLOs and app instrumentation, quality gates in a CI/CD pipeline, and chaos engineering according to his experience in the past 15 years.
Michael also believes that simulating a production incident to test reliability and observability can be a challenge. Chaos engineering brings a new building block into the DevOps and Observability platforms backed by an example that the engineering teams at LEGO tackle their Ops challenges with Chaos Engineering.
Chaos Engineering 2022: Saiyam Pathak
In this talk LitmusChaos community member Saiyam Pathak highlights the impact of chaos engineering in the cloud-native ecosystem and the various toolings around it.
Saiyam highlights how Chaos engineering tools have matured overtime and are also moving towards a standardisation by collaborating on the Chaos Engineering whitepaper so that the tools move in the right direction. He shares his vision on how Chaos Engineering moves forward from here in the year 2022 and discusses the two famous tools on the cloud native landscape - LitmusChaos and Chaos mesh and how both of them have evolved overtime to tackle the various chaos challenges.
Chaos Engineering in Multi-tenant and Hybrid Environments: Karthik S
Karthik begins his talk by covering how with the advent of Kubernetes-driven application development & deployment, Cloud Native Chaos Engineering has slowly, but steadily become an established paradigm. This entails using Kubernetes itself as the substrate and control plane for the execution of chaos business logic on microservices and their underlying Kubernetes infrastructure to evaluate and improve resilience.
He addresses the lingering questions/concerns around how Chaos Engineering fits into an enterprise setting that is largely hybrid, with services hosted on both Kubernetes clusters as well as “traditional” infra (baremetal, virtual machines, cloud instances making use of platform managed services, etc.,). Under ideal circumstances, an SRE would prefer to use a single-pane-of-glass approach to manage chaos engineering requirements.
In his talk, he also discusses how Kubernetes can be leveraged, along with the platform APIs of the infrastructure provider to achieve the desired fault injections and what are the best practices associated with this process. Eventually, demonstrating the said model with VMWare as the platform of choice.
The Applications of Non-k8s Chaos Experiments using LitmusChaos: Neelanjan Manna
Neelanjan has been a core contributor to the LitmusChaos project since the last one year and believes that although Cloud-Native technologies are the cause of the paradigm shift which has enabled businesses to scale up flexibly, the vast majority of the systems that still utilize the Non-Kubernetes stack such as BareMetal servers, cloud Infrastructure, Cloud VMs, etc. are also significantly important from the perspective of service reliability.
In his talk he takes us through how LitmusChaos simplifies the process of Non-Kubernetes Chaos Engineering with its vast range of Chaos Experiments that help in validating the reliability of an entire business use case.
GitOps meets Chaos Engineering: Sangam Biradar
In this talk Sangam covers another exciting and new aspect of how one can use the GitOps terminology with Chaos Engineering.
He uses the Okteto Cloud Platform to run LitmusChaos experiments and chimes in on the crucial importance of GitOps in the Chaos Engineering space required to scale applications.
Day 2
Chaos Engineering alongside LitmusChaos and Jenkins: Akram Riahi
Chaos engineering is being enabled within a lot of companies and LitmusChaos Community member Akram Riahi has been vital in helping a few of them to enable chaos using Litmus. He talks about how LitmusChaos is being implemented at a company called Talend. While automating Chaos has become a must, in his talk he discusses how to facilitate integrating chaos engineering within ones Jenkins pipeline after QA testing our application image and before promoting it to production.
Akram emphasized on why it is important to enable developers to inject Chaos in their DevOps pipelines as often as they want and how this procedure can be made easier for developers who are bound to face roadblocks while injecting chaos.
Akram also advised communication with all DevOps team members regarding how the process of injecting chaos can create some performance problems or even failures with a blast radius in a live production environment.
In conclusion he urged not to be afraid of failures as they are instructive only to ultimately build more resilient applications.
Level-up your organisation with DevSecOps practices & Chaos Engineering: Nik Jain
Nik apprehends that it has been more and more evident that DevOps and SRE (Modern Ops Teams) have inadvertently conflicting goals. This unintended tussle results in costly downtimes and degraded user experience that causes erosion in customer confidence and revenue leakages. He addresses how enabling automated quality gates (SLO-based) or gate-keeping mechanisms that automatically assess the quality of software features/release will help in providing developers early intervention with prescriptive feedback surrounding improvements and optimization by skipping unnecessary and expensive production war rooms.
The fun doesn’t stop here, he also puts light on how you can also consider incorporation of security gates in DevOps processes to enable DevSecOps that help detect Log4j vulnerability like situations in continuous and automated fashion with an all-encompassing full stack agent which helps shift left from a reactive SecOps-only approach to early risk detection, mitigation and management. He showcases the good practices surrounding DevSecOps and Chaos Engineering along with adding Chaos Engineering to the mix for added business resilience to help SREs develop muscle memory that help remediate issues faster and prepare better for preempted and unexpected production issues (known and unknown unknowns).
His talk aims to benefit developers, release train engineers, engineering management (VPs/CTO/Mgrs), SRE, Testers, Chaos engineers and more.
Configuring Kubernetes for Reliability with LitmusChaos: Michael Knyazev
Michael begins the talk by highlighting why Chaos Engineering practises are relevant for maintaining an effective CI/CD pipeline that ensures system reliability. However, Chaos Engineering experiments are traditionally time-consuming and potentially unsafe to run as they can have severe undesirable effects. Hence automating Chaos Engineering for safe execution of chaos is emphasised along with informative logging for best QA practises.
He puts light on how LitmusChaos is helpful in covering all these requirements. It can be used along with the ChaosCenter or using just the Litmus chaos-operator. ChaosCenter can manage all the different aspects of managing chaos engineering at scale and as a collaborative practice. ChaosCenter makes use of Argo workflows which is a popular choice for orchestrating chaos workflow task containers in Kubernetes.
There are open sourced blueprints of Chaos Workflows available which can be used as the base templates for the various Chaos Engineering scenarios. The reliability pipelines are fully Kubernetes native. For example, Jenkins can be used to trigger the reliability pipelines.
In the second part of the talk, Michael discusses technical implementation details of the reliability pipelines. As a technical demonstration, Michael refers to a Jenkinsfile that configures a reliability pipeline to be run. It contains the various stages that configure the workflow to be run.
Finally, let us take a quick look at the 2 workshops that kept the attendees engaged during the breaks:
Workshop 1: LitmusChaos on Raspberry Pi Cluster
Workshop Abstract:- Running K8s on your local system using Minikube, K3s, etc. has been the go-to for many developers working towards K8s based technologies like LitmusChaos, which sometimes require more system performance than we already have and cloud-based providers are costly in the long-run. Raspberry Pi provides a cheaper alternative to these options, rather than upgrading your system or paying for cloud-based services for development needs. In this talk, we will explore LitmusChaos running on Raspberry Pi clusters for use in development environments, starting with a tutorial on how to set up a Raspberry Pi cluster running K8s and then installing LitmusChaos on the cluster to run experiments. We will also take a look at ways we can use this in staging/production environments.
Workshop 2: Using LitmusChaos in ChaosNative Litmus Cloud(CLC) DevOps cycle
Workshop Abstract:- Even though the automated CI/CD pipeline enables fast product iterations, provides standardized feedback loops for developers and reduces the chances of manual errors, We can’t predict all of an application’s failure modes. Therefore we need solutions that help us to discover application-level vulnerabilities. So This is where ChaosEngineering comes into play. In this workshop, they will be discussing how they have integrated LitmusChaos into their cloud platform (ChaosNative Litmus Cloud) DevOps pipeline to do Chaos Engineering.
In the end...
The conference was testimony to the leap LitmusChaos has made since its inception as a tool in the cloud-native ecosystem. People have adopted, contributed to and loved the tool for their Chaos Engineering needs and the community keeps growing with content and such amazing talks.
I hope this blog helps you with some good insights and the talks become exemplary for the community. I would like to thank Henrik Rexed, Michael Friedrich, Saiyam Pathak, Akram Riahi, Neelanjan Manna, Sangam Biradar, Raj Das, Adarsh Kumar, Akash Srivastava, Udit Gaurav, Karthik S, Michael Knyazev, and Nik Jain for their valuable contributions, constant support and of course the amazing talks that they have delivered at this year's Chaos Carnival.
I look forward to another edition of Chaos Carnival next year with more such amazing talks, especially on LitmusChaos. Until then do check out all the awesome talks available on the Chaos Carnival Website and stay tuned...
Join the LitmusChaos Community:
Want to get help with queries, learnings, & contributions? Join the LitmusChaos community on slack. To join the slack community please follow the following steps!
Step 1: Join the Kubernetes slack using the following link: https://slack.k8s.io/
Step 2: Join the #litmus channel on the Kubernetes slack or use this link after joining the Kubernetes slack: https://slack.litmuschaos.io/
Looking forward to see all the amazing folks from the open source world! :)
Here are some important links for your reference,
LitmusChaos Website: https://litmuschaos.io/
LitmusChaos GitHub Repo: https://github.com/litmuschaos/litmus
LitmusChaos Docs: https://docs.litmuschaos.io/
LitmusChaos YouTube Channel: https://www.youtube.com/channel/UCa57PMqmz_j0wnteRa9nCaw