Make Kubernetes and Platform Engineering Easier Part 1: AIOps

Michael Levan - Sep 7 '23 - - Dev Community

One thing in today’s tech world holds true - engineers want the ability to make what they’re doing more efficient. With the hundreds of tools available, various platforms to deploy to, and many decisions to make, this journey can be incredibly difficult. Not because the tools or platforms they’re using are necessarily difficult to use, but because there are simply far too many options available.

Engineers need a Platform Engineering location of choice to make their lives easier vs utilizing various tools which ultimately ends up causing confusion, tech debt, and a major nuisance when it comes to upkeep.

In part 1 of this series, you’re going to learn one of the newest ways to make your life easier with AI Ops and Nethopper.

Prerequisites

If you'd like to follow along with the hands-on portion of this blog post, you can sign up for Nethopper for free here: https://www.nethopper.io/

Is AI Just Buzz?

Opinions vary and engineers will be the first ones to say it. If you line up ten engineers in a line and ask them their opinion on something, chances are you’ll get eight different answers. Why? Because there’s not one way to do “a thing”. When it comes to AI, that conversation leaves much to be desired.

The truth is, the whole Generative AI and “how AI can make your life easier” stuff is very buzzy right now. Engineers are seeing it left and right and becoming quite sick of it because they can’t escape it.

However, with everything that’s “buzzy”, there’s a little bit of truth. If you look into true AI, not Generative AI, the long and short of it is it’s all about making predictive decisions based on data that exists. That concept can be an incredible helping hand when it comes to troubleshooting. Think about it - if you could have some predictive analysis on how your environment is going to react based and performance based on load, users, etc., that would be incredibly helpful. It could ultimately even help reduce the need for on-call 24/7, or rather, reduce the calls that on-call engineers receive.

Tying AI and ML Together

When you hear about AI, it’s almost impossible to not also talk about Machine Learning (ML). ML is the method from which AI gets it’s data from. Remember, AI is nothing without a set of data to go off of. Machine Learning is the method of collecting the data. The data then gets turned into data sets and eventually, data models are made up of data sets. You can, of course, create your own data set if you wanted to, but Machine Learning has the ability to do the heavy lifting for you.

In the world of Kubernetes, you’ll typically see Kubeflow used. Kubeflow is a method of running ML workloads on Kubernetes. It includes automated Machine Learning, model serving, model training, and model development.

AIOps and Generative AI have the same workflows as standard AI. They both need data. From an AIOps perspective, the data comes from the Kubernetes cluster. If you’re looking at AIOps when using Nethopper, you can see that it gives you the ability to automatically diagnose a Kubernetes cluster for any issues. The question is - how does it work?

Image description

How it works ties back to what you just learned about above, which is data sets and data models. Data is collected about how a Kubernetes cluster should be performing, best practices, usual issues, etc., and then the scans occur based on that data.

When you think about it, it’s almost like the next level of automated monitoring. This, as discussed in the Is AI Just Buzz? section gives engineers the ability to, for example, have a much easier time when they’re on-call. The idea for AIOps for a lot of engineers is to handle the low hanging fruit. The goal is for an issue to come up, AIOps see it, and automatically fix it.

Where Nethopper Comes Into Play

The idea around AIOps, which ultimately ties into the idea of Platform Engineering to make platforms you’re working on more efficient for engineers and developers that may not have the underlying expertise, need a tool or product to perform the actions for you. You could set up your own method of using AIOps, or even build your own tool, but that would defeat the purpose of making things easier for yourself. Ultimately, there’s no reason to reinvent the wheel when a tool already exists.

This is where Nethopper comes into play.

Throughout this series, you’ll see various methods of how Nethopper helps with Platform Engineering, but let’s start with the AIOps piece.

Quickstart Cluster Configuration

First, you’ll need a network and a cluster available. A cluster is a Kubernetes cluster and a network is a set of clusters.

First, create your network. You can leave everything default for now.

Image description

Image description

Next, define your cluster. Your cluster is going to be the name, the location, and where it’s installed. For example, you can choose AKS if you’re running Kubernetes in Azure or EKS if you’re running Kubernetes in AWS.

Image description

After the configuration for the cluster is complete, you’ll see it up and running.

Image description

For a deeper dive into the network/cluster setup process, you can take a look at the quick start documentation found here.

AIOps

Once the cluster and network are configured, that’s pretty much it. You don’t have to set up Machine Learning, data sets, data models, or anything of the sort. It’s all handled for you with Nethopper.

Under the Observability section of Nethopper, you’ll see the AIOps button.

Image description

Once you click on it, you’ll see one of two things. No issues found or issues found. If there are no issues found, you’ll see an output similar to the screenshot below.

Image description

If issues are found, you’ll see them listed.

Image description

Once you click on one of the issues, you’ll see the results. The results indicate where the issue is occurring, the error message, and more importantly, how you can fix it.

Image description

This is where the power of AIOps comes into play. You don’t have to spend hours troubleshooting an issue or a concern. Instead, you can get suggestions generated for you.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .