DevOps and SRE: 2022 Advice

Michael Levan - Dec 24 '21 - - Dev Community

As we wrap up 2021 and head into 2022, it’s time to start thinking about where our focus should be from a technical perspective in 2022 to ensure we’re staying up to date and are constantly keeping ourselves ahead of the learning curve.

2021 was an awesome year in tech and it opened our eyes up to a few key realities:

  • The cloud doesn’t always stay up
  • Application Performance Monitoring (APM) is crucial for any app

In this blog post, you’re going to learn what to think about in 2022 via the DevOps and SRE space. Although this isn’t an exhaustive list, this should give you a good foundation.

Application Monitoring

Application monitoring was always somewhat important, but definitely not as apparent as it is now. Way back when, a lot of engineers thought about how to monitor systems because that’s where the application was running. Now that there are other styles of systems like containers and serverless, it’s not really about the underlying infrastructure anymore, but instead about how the code is running.

Testing if applications perform the way that you’re expecting and can handle the load from a user perspective can be broken down into two categories:

  • Application Performance Monitoring (APM)
  • Load testing

Some popular APM tools are New Relic and Datadog. A few popular performance testing tools are Apache’s jMeter and Vegeta. Whichever you choose, ensure that it serves the purpose of monitoring the performance of an application and it gives you a way to test the performance.

Systems Monitoring

When it comes to systems monitoring, you have to think about systems in a different way than just virtual machines. Systems can now include:

  • Containers
  • Kubernetes
  • Serverless
  • Bare metal
  • Virtual machines

Each of those systems can host an application, which means it needs to be monitored properly.

Containers need to be monitored to ensure the binary/app is running properly. Kubernetes needs to be monitored to ensure etcd, the Kubernetes API, the control plane, and many other factors are working the way they should be. Serverless needs to be monitored to ensure the application is running as expected and is scaling the way it should be. Bare metal and virtual machines need to be monitored for not only the application, but for the RAM, CPU, and other physical components.

Scalability

Scaling comes in all shapes and sizes, but in recent years, it’s very cloud-focused. Although there are still many enterprises and organizations that have been around for a while running on-prem, most new startups are 100% in the cloud and existing organizations are starting to run several workloads in the cloud.

However, that comes at a cost (in this case, non-financial).

We all know about the three outages in AWS this month. The cloud isn’t the end all be all. We still require scalability across regions and availability zones, but like we do with on-prem workloads. If you take on-prem workloads and scale them across data centers, you need to do the same thing in the cloud.

There’s a common misconception that if something is running in the cloud, the cloud provider will take care of it for you. That’s not the case. Data centers still go down and servers still crash. It’s up to the engineers to handle the scalability.

Let’s take AWS for example. You can run workloads in us-east-1 and set up VPC peering so you can scale the workloads to us-west-1. The same thing can be done in Azure or other cloud providers. Another option is the hybrid cloud model where you’re running your workload across different cloud providers.

Repeatability Instead Of Automation

I still remember the days when automation started to become more and more popular, many engineers thought that it would take away their jobs or that they would automate themselves out of a job.

As luck would have it, automation sort of made our jobs harder.

The funny thing about automation is even if something is automated, that doesn’t mean it’s better. It could simply mean that you’re literally failing or crashing something faster.

Because of that, I propose a new way of thinking; instead of thinking about automation, think about repeatability.

Automation is great, but as previously explained, you could end up automating an error-prone workload. The only difference is whatever you’re deploying will fail faster. Instead, when you’re thinking about automation, think to yourself does this manual task need to be repeatable? A few common examples are:

  • If an application is being deployed, it’ll need updates, patches, and hot fixes. That means it should be in a CICD pipeline.
  • If you need to deploy a VM for an application to run, that means you’ll need more VMs in the future, which means you should automate the VM creation process.

Tasks should be automated, but they shouldn’t be automated just for the sake of writing a script because it looks cool. Tasks should be automated with repeatability in mind.

Think About Leadership

In one way or another, engineers need to be leaders. We can no longer just be robots that write code and make tech happen. We need to explain to leadership teams and management why we’re doing something and why it’s beneficial. We need the ability to explain what we’re doing from a technical perspective and why it’s important to the business.

I’m not saying that you need to sit on a board meeting and showcase 20 PowerPoint slides, but you do need to understand the business impact of a tech decision, why it’s important to the business, and how to explain that to the leadership team.

At the end of the day, management is counting on good engineers to explain situations.

Everyone Is A Developer

I get heat for saying this so much, but it’s fine. I still think it’s important to say.

Everyone is a developer.

Before you close out of your web browser and never talk to me again, hear me out.

There’s a notion that if you’re a developer, that means you’re building applications or you’re building the next Twitter or Instagram. This is so far from the truth.

If you’re writing Terraform code, you’re a developer. If you’re writing PowerShell or Python scripts, you’re a developer.

By definition, a developer is someone who creates computer software. By definition, computer software is a set of instructions that tell a computer how to work.

Your code, regardless of if it's application code, infrastructure code, or automation code, is a set of instructions to tell a computer what it should be doing.

In 2021 and 2022, we saw more and more engineers needing to learn how to write code. To automate tasks, make workloads repeatable, and move faster. Writing code will always be around in one way or another.

Yes, you’re a developer.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .