This is a transcription of a video podcast episode. Watch it on YouTube, listen to it in your podcast app.
In the first-ever episode of the DevOps Speakeasy podcast, we interviewed Anton Weis on DevOps consultancy. These JFrog podcasts are designed specifically to deliver the latest and greatest in DevOps practices while talking with industry experts who share their technological experiences and provide insight into simplifying your organization’s operations.
Baruch:
Welcome to the first episode of our DevOps Speakeasy podcast. We have an amazing guest Anton Weiss, an international DevOps Rockstar and a speaker at many conferences including our very own Yalla DevOps. We're going to get to Anton very soon. A couple of words about the Speakeasy and the podcast format. DevOps Speakeasy was born a half a year ago, as a set of interviews that we do with speakers and celebs in the DevOps world. Usually, the interviews are recorded live at conferences where we just grab people on the show floor or find a conference room and just interview them for a very short (8-10 minutes) about one topic.
Even though no one is going to conferences anymore, we still want to speak with awesome people, so we decided to retrofit this format into a podcast because as you know, everybody is podcasting and no one is podlistening, including us. It means not face to face, but in our virtual studios with a longer format. We aim for an hour-long episode, so we have plenty of time to discuss all kinds of things and go into all kinds of topics and we'll see how it goes.
We'll probably have our share of five followers including Anton's mom and a couple of more, but we hope it will be interesting for you. It's for sure going to be interesting for us. Today we have the amazing Kat Cosgrove, a developer advocate at JFrog is here with us. And myself, I'm Baruch Sadogursky, Head of DevOps advocacy at JFrog. And as I mentioned Anton Weiss. So let's get to it. Who are you, Anton?
Anton:
Wow, that's a philosophical question. Well, so who am I? I'm a human being. For the last 30 years, I've been living in Israel and before that, I was born and brought up in Russia. For the last 20 years, and that's what's relevant for the topic of this podcast, I've been dealing with information technology. I started out as a developer writing code in C and C++. Then moved on to a number of different integrations, whatever they were called positions. With time, the things that I was doing started being called configuration management.
And then at a certain point, Patrick Debois invented a new wonderful word, DevOps. Since then that's what I'm all about. For the last four years, I've been doing this for myself, actually doing it for my customers, for my clients. So I'm the head and founder of a boutique consulting company called Otomato Software, where what we do is we help our clients, large enterprise companies and small startups to make DevOps easier. So DevOps Speakeasy makes DevOps easy.
Baruch:
Exactly. And speakeasy about it. With DevOps consulting, I tried to get my head around it. How does it look? What is it about? So how does it work? You come to the company and then you do what? Do you bring DevOps engineers with you?
Anton:
That's the first thing I do. I come to them and I say, "You shouldn't have any DevOps engineers." And then I take a couple of 1000s bucks and I’m out of the door. So basically that's DevOps Consulting.
Kat:
So great, are you hiring?
Anton:
Well, this is hard to sell. Don't be mistaken. I never called this DevOps consulting even though the word DevOps is all around me and I pronounce it many times a day. But still what I do, what I like to say I do, is software delivery optimization.
Organizations create software. They want to make this software usable by their users, so we need to deliver software and they have some hardships on the way. That's where I see our expertise is. To understand where the roadblocks are, where the bottlenecks are, to analyze the whole process and help find the ways to make this thing easier, faster, more fun.
This includes, of course, all of the components; the processes, the tools, the people and the information. We always talk about people, process, tools. We don't talk enough about the flow of information around this thing because it's also not less important. Sometimes even more important because the information is what flows through the pipeline.
Baruch:
But that's super broad. I mean software delivery optimization is everything. Starting with picking up the right programming language, and having good developers, and using the right design patterns, it is the stuff that we almost never associate with DevOps. I think saying, "I'm here to optimize your software delivery, saying I'm here to make your life better. Your coffee brand is wrong." It's everything. But people want tangible goals because they want tangible results and they want to hold you accountable. So how do you even go about it?
Anton:
Well generally, consulting is a tricky business. So yeah, you assume your customers want tangible results. They don't always do because they have their consulting budget and they are accountable for those budgets. And if you don't deliver the results, then they're accountable to their managers. So sometimes, yeah. They do expect the results, but they don't really want to measure it because if for some reason you don't have the results and the results the consultant gets are not always dependent on the consultant, him or herself.
Because a consultant can only recommend things, a consultant can coach, a consultant can mentor, but in the end, the implementation of the consultant's recommendations is on the part of the client.
Baruch:
That would never fly with our boss, that I can tell you right now. You need to find very specific people that don't care about the results or holding you accountable.
Anton:
Well, what I'm trying to say is that many people say they want to see tangible results. But when the topic of my talk at Yalla DevOps and many other talks that I delivered in the last couple of years was about how do we measure the results of DevOps. How do we measure the return on investment from DevOps? And the thing I see time after time that there are very few organizations really investing effort in this. So when we do DevOps, we actually want to achieve two things: We want to deliver software faster and with better quality.
You would go faster but things are always broken that we haven't done anything. And if we're achieving good quality, but we can't keep up with the market demand then again, the DevOps isn't working. So we need to be measuring across both speed and quality or as they are also called, "Velocity" and "Stability". How do they call it in the Accelerate book?
Baruch:
Lead time, deployment frequency, failure rate and mean time to recovery.
Anton:
Right. Okay. So there are four very simple measures. How many organizations out there are actually measuring this?
Baruch:
Well, those to which Nicole Forsgren comes and ask those questions, then I think they start caring about it.
Anton:
Probably. My guess again is what is the State of DevOps Report based upon? It's based upon self-reporting surveys right? So the question is are they based on data? My guess would be that most people providing answers to the server are saying, "Well it's more or less this." They don't have a penny to that because collecting this data is an investment.
And we don't see organizations really, really investing in doing this because we're very, very busy fixing the deployments. Or we're very busy replacing our current infrastructure with Kubernetes or we're very busy hiring DevOps engineers because they're so hard to find. Because they don't exist. It's very hard to find something that doesn't exist.
Kat:
I know we got a whole TV show about finding something that doesn't exist. There's a seven season-long show about finding bigfoot. Haven't found him yet. Seven seasons though.
Baruch:
Yeah. The truth is out there. That's another very long-running TV show about finding stuff that might exist. So, you know what? I'm worried now because what you are saying is that everything is smoke and mirrors. There are no actual measurements, there are no actual results.
You come to people, and excuse me, bullshitting them and they buy your bullshit and then no one measures anything. And I'm happy for you. You get their money, everybody is happy. But seriously, I personally thought that it's a serious engineering discipline that we are serious people that deal with data, with numbers, argue with data, measure stuff and have tangible results. What I hear is that none of that is true. You've got me worried. Now you need to calm me down. Seriously, let's talk about it. What you are saying is no one is measuring anything and no one knows what they're doing and the improvement is basically the feeling that we have better about them ourselves.
Anton:
Now, that was actually the first part of the show, when I get everybody pissed off with me.
No, I'm not saying that nobody measures anything. I'm saying what I see is that, well, again, I'm biased, because when I get called in, I'm usually called in where organizations are in pain. Because if you're not in pain, you usually don't call a consultant. Until the virus breaks out you're saying, "Okay, we can handle this ourselves."
Kat:
Yeah. I used to work in the data backup industry and that's something we saw consistently there too, that nobody cared about having backups. Businesses didn't care until they got hit with a CryptoLocker virus and they're just utterly screwed. And there's some overlap there, with DevOps now that we've got DevSecOps as a thing. Worrying about security from the very beginning.
Baruch:
Okay, I hear you. So people come to you when they think they are in pain and you sell them this smoke and mirrors that you will be fine.
Anton:
No, that's not what I'm selling. What I'm saying is I'm providing measurable results. Here is what you need to have in place in order to measure this and here is what we are going to do. An assessment process that we've been doing for the last three, four years with different organizations.
So we sit down with all the players of this delivery game. The developers, the operations, the testers, the project managers, product managers, whoever is in this game, whoever's providing requirements, whoever is implementing requirements, whoever is building the infrastructure, et cetera, et cetera, et cetera.
And even from the very basic mapping, you can realize there are certain bottlenecks. There is the low hanging fruit that everybody knows is there. And many times when I come to an organization, they already know where the pain is. Even though sometimes when we sit down and talk to everybody, we'll find out that's not the real pain.
They say, "Our build takes three hours," but then you sit down with them and you realize it takes them a week to approve a version so you say maybe the three-hour build is not your biggest bottleneck, so after the assessment, we more or less know what we're going to do.
I cannot promise how much better the organization will become because it's a process of continuous improvement. But the first thing that I'm saying is, we need to start measuring this. And they're saying "it's very hard to measure. How do we know when our lead time ends because we have commits at the beginning of the release?" We're still not doing continuous delivery! BTW most organization out there are not doing continuous delivery. So they're saying, "We're not doing continuous delivery yet, so how do we count the lead time? Is this from the time the commit has been pushed until the time the release is out there? Or until the time we roll this out to the staging environment? Or until the time, whatever, a bug has been found?"
So we need to define these things. It's quite okay if each organization defines them differently because the important thing is you start measuring. You define the start point and the end point. And then you can say, "Okay, this the whole thing." And if the whole thing is not getting faster, probably, I'm not optimizing at the right point. Then you start searching for the actual bottlenecks because as the theory of constraints teaches us, optimizing anywhere but not at the bottleneck, is not optimizing anything at all. So that would be the correct approach.
This is how it works in a perfect world. This requires a lot of discipline, this requires not panicking, and this requires you not to be in too much pain. But I said, most of the companies that I start to work with are already in pain. So first of all, they need a cure for the pain. So they would say, "Our production is crashing all the time. We have critical alerts five times a week." So then we say, "I don't care how long your lead time is. The first thing you need to do is to take an aspirin." Let's just sort out your alerts. Let's put all of the focus on your production alerts, let's wake up everybody when an alert occurs, and let's make sure that we're reducing these alerts.
At one company that we worked with, we've created something that we call the pain board. They had a very unstable environment and things were always crashing. Things were crashing because of the same issues that never were fixed because they didn't have the focus in order to fix this. So we started putting them on a board on a large monitor for everyone to see. And we started having daily meetings where we would look at the board and say, "This issue has been coming back for 50 times around. So we need to take care of this first of all because we don't want it coming back anymore." When you work in this way, slowly but surely, you start reducing the amount of issues. You start reducing the noise, you start reducing the pain, and then you can start focusing on the actual delivery problem.
Baruch:
Okay. So the first stage is kind of firefighting mode when you just fix the obvious problems which are obvious for everybody. No need to do any sophisticated analysis and what's not. Every deployment that we do fails, so let's fix that.
Back to the bottlenecks, I wonder if you can name one the biggest bottleneck that you see in the majority of your customers or the organizations you speak with. Or maybe not one, maybe a couple, but the biggest pains that you see.
Anton:
Well, that's pretty easy. There was this talk that I was supposed to deliver at DevOops, the subject of the talk was DevOps Hero's Journey. So I was taking the hero's journey by
Joseph Campbell. What he did was study the myths of different countries and different people. Things like a story of the myth of Buddha, the myth of Jesus, the myth of Muhammad, whatever. He was saying all of these stories, all of these myths, they follow the a monomyth pattern, they're all the same myth, but some of them have a bit more steps, some of them have some steps missing, but the overall pattern is the same. So you could actually break them all down into the same pattern. The same thing happens in implementing DevOps practices.
Baruch
Just before we jump into implementing, credit where credit is due. So Joseph Campbell, the book that we all learned from is called The Hero with a Thousand Faces. It's 70 years old, but still super relevant. This is where the theory of the monomyth is laid down. So what you are saying is the DevOps Hero Journey is feeding into this monomyth theory and it's the same journey like the journey of Jesus for that matter.
Anton:
Sure. So it's a journey to a better life through hardships and the challenges that you see on the way. But when I started to think about what the journey was, I realized that there is always this stage where you realize that no matter how many things you automate, no matter how many tools you replace, at a certain point you start realizing that our biggest bottleneck is human deliberation, human communication.
Well, that's where DevOps started. Let's take the original talk by John Allspaw and Paul Hammond, which actually caused Patrick Debois to invent DevOps. It was about how did they manage to deploy 10 times a day at Flickr. Because their Devs started thinking a bit more like Ops and their Ops started thinking a bit more like Devs. So it was all about a change of thinking, it was all about a change of mindset. Nobody was talking about Kubernetes, excuse my French, at that time.
Baruch:
You completely ditched my question because saying that the problem of everybody is collaboration, is like saying, well, the problem is everybody are humans. While this is true, it's kind so broad that it's not really actionable. Let me ask you again, to the actionable, maybe even technology-related, what are the bigger bottlenecks that you see? For example, you mentioned that the majority of the companies don't do continuous delivery, is that one of them or there are others, or maybe that's not the biggest problem at all?
Anton:
Technology-related. Well, the biggest problem, technology-related is, of course, managing complexity.
The scale that we work at today in large organizations especially causes so much complexity that it's really very, very hard to manage this, conceptually and technologically. So it's very easy to build a pipeline for one monolith service while it's still small and it has a small number of dependencies. And when we have one, two, maybe three teams, that's easy. The moment we need to manage a couple of dozen microservices with a couple of dozen teams, that becomes hard because even with the tooling that we're building today, we're not resolving the conceptual understanding of this. So the tooling is there, take Kubernetes for example. Kubernetes is a great tool, but in order to manage this complexity, the tool itself has to be complex. Then we as humans find ourselves limited in our ability to manage it.
Kat:
Kubernetes is really hard. The learning curve is steep there.
Anton
Exactly. So this is a technological challenge as well as it is a management challenge because the engineers are dealing with complicated large scale technology that is very complex. That is complex for them to grasp. Nobody has the whole picture. One of the latest projects we're building is a continuous delivery pipeline for data science. So there are those ETL workflows that are running on Airflow. Then we need to understand how to continuously deliver these Airflow dags (directed acyclic graphs) to the production environment. Do we use Airflow itself? Do we use Jenkins? How do these tools interact? None of the continuous delivery engineers understand the complexity that's involved with ETL processes. None of the ETL engineers understand how continuous delivery is done. So how do we make these things work together? No answers, very hard. Now take a manager that tries to make these two teams work together. What does that manager tell them? How does that manager allow them to work together? That's complex.
Kat:
So it's still a people problem, not strictly a technology problem?
Anton:
Yes. Well, technology is still undeveloped where the scaling things might be faster than the tooling is developing. As an example, I was thinking about this, there are some pieces of technology that really astonish me. For example, take the automatic suggestions from Google. When you type messages today there are automated suggestions. You can actually see how it's getting smarter every day somehow, whereas two years ago it was really, really stupid.
Kat:
It was. Another example is the Twitter flashmob that go around where you start a sentence with something predefined like, "I'm quitting tech because ..." and then you let the autocomplete finish it for you. It's turning out some stuff, it's like actual sentences. One of my friends got, "I'm quitting tech to become an artist." Jeez, I know people who have actually done that. That's a real thing.
Anton:
So what we really need is our delivery, deployment technology to be like that. If I've built a deploy and it's not going to be deployed, the system should be suggesting this to me. Saying, "This, when you roll it out to production is going to be broken." So maybe what you really meant was saying, “I want this to be, I don't know, highly available.” You're saying you want a blob storage but actually you're probably meaning to use S3.
Kat:
We want tooltips. I don't want my deployments to do everything for me because that level of leaving certain security concerns or networking concerns up to an automated wizard makes me a little bit uncomfortable. That's pretty consistently exploitable eventually. But tooltips would be nice. Like, "Hey, are you sure that you really want to do it that way? Because this might be less than ideal." I think some tooling is doing that now. Right? Don't we have configs and patches on the platform?
Baruch:
I think a lot of ChatOps is kind of doing that. One of the examples is Automist. They're giving you hints of what might be wrong and then a very simple way to actually fix it with one pre-canned answer.
Kat:
A linker for your Ops.
Anton:
The industry has been moving in that direction. There's also a startup here in Tel Aviv that I've recently been talking to. We're thinking in that direction the incident response. Like they're saying we can find certain patterns in how we fix certain problems or how we debug certain problems.
Kat:
So what we need is an AI overlord. That's where we're moving towards.
Anton:
It's not even artificial intelligence, it's more like, machine learning probably. Not a system that thinks for itself, but a system that at least takes into account whatever we humans know.
Kat:
Baruch, would the ML that handles a fictional system like this, can it be the DevOps engineer?
Baruch:
Oh, that's a good question. It's all about the collaboration again. Why I don't believe that DevOps engineer is a real job because DevOps is about collaboration. Having DevOps engineers doing DevOps all day, it's like having collaboration engineers doing collaboration all day. I mean it doesn't make sense, and this is why we have the empowered teams and the T shaped people who actually know a lot about one topic and a little bit about everything else.
The machine-learned artificial intelligence that we are talking about is probably capable of being an O-shaped engineer and knowing everything about everything. But until we get there, we still need people who specialize in different things and none of it is DevOps as a technical discipline.
Anton, I wanted to get back to what you said about complexity. Because I'm trying to figure out, you as a consultant, you saw it all, you've been through it all. How do you advise people to go around complexity? Because I think there are two scenarios. The one is unnecessary complexity. People just got in love with some architectures invented in the ivory towers and bringing that upon themselves when they don't need it.
Kat:
That's a sore spot for me. I hate it when people do that.
Baruch:
But the good news is that it's fixable. All you can say is, okay, you throw it all away, you start easy, you start small, you start with your manageable small or monolith and go from there. But I mean, it not easy, but you know how to go about it and the solution is walking away from the complexity. I think the more dreadful scenario is when you cannot walk away from the complexity. When complexity is a necessity as a part of your business requirements or the business situation you're in, what are you doing then?
Kat:
That’s a big question.
Anton:
You do your best at managing complexity. It may sound a little bit vague, but as a systems thinker, (I like to call myself systems thinker because it sounds so very fancy) I was taught by wonderful folks, for example, Dr. Russell Ackoff, that the complexity doesn't lie in the competence of the system. If we break down the system into a lot of components and make each component simpler. The system as a whole one becomes simpler because the complexity lies in the interaction between components.
So managing complexity basically is managing the interaction between components. Well, I think the first thing you should do is somehow try to outline all the possible interaction between the components in your system and throw the necessary effort at managing these interactions. That's actually what we're trying to do today with the technology that I'm a little bit in love with. That would be service meshes.
Baruch:
All right, that's a very nice segue. I'm loving it. Let's switch gears and talk a little bit about technology. That was a lot about methodology and whatnot. Let's talk about the hardcore stuff. Meshes as an answer for complexity problems. Talk to us.
Anton:
Well, basically, as I said, the best way to manage complexity is by managing the interactions between components not by making components simpler. We'll let the competence be as complex as they need be and we actually let the interactions be as complex as they need be. But we need to make these interactions understandable, manageable, and for that, we need another buzzword. Observability of those interactions. That's one of the things that service meshes allow having more or less out of the box.
Kat:
For people who are watching this, who might not be super familiar with the space, can you just go over what a service mesh actually is and what it might look like?
Anton:
Okay, a very short introduction into service meshes without diagrams. That's actually something that's a little bit hard to grasp without any diagram.
Microservices or any complex distributed system that we might've built cause us macro pain. Because there's a lot of complexity to manage. Our services keep talking to each other and some services that we talk to don't reply in a timely manner. Some services are for some reason not available. Or other services don't provide responses that we expect them to provide.
Kat:
Sure, it looks like the microservice version of spaghetti code.
Anton:
Yes. More or less that's what any complex system grows into with time. Now in order to manage all of these challenges, a number of techniques were developed over time in order to manage all the challenges of distributed systems. Like in spaghetti code right here, you have the issues of the spaghetti code of the interaction between the components. But when you start running a microservice system, there is another very important component of the system. That's the network because, in a microservice system, all our components are interacting with each other over the network right?
Kat:
Sure, yeah.
Anton:
This network is unreliable. This network can be slow. This network can, for some reason, reject the protocols that we want to talk over this network with. In order to manage all this complexity and a lack of reliability with time, we've developed a number of patterns, things like connection pools in order to have enough connections whenever we need them. All kinds of failure detectors, failover strategies, things like circuit breaking, a balk hitting, exponential back-offs. The load balancers, of course, back pressure techniques such as the rate-limiting, et cetera.
Up until now, we've had a number of companies that would allow us to manage this, but many of those concerns were still managed in the code by the developers. But many of these can be seen as actually operational concerns. This disconnect between developers implementing operational concerns, but then the operators stumbling upon them, it was causing us a lot of issues. That's why in 2016 the folks at a company called Buoyant have created a system called linkerd (which was actually the first-ever service Mesh the world knew). Their creation of this component was based on their experience at Twitter. Twitter is famous for the libraries that they've built specifically for dealing with these issues. Things like Finagle. They took that experience from Twitter and they built it out in a separate component saying we don't want each developer to start integrating with this library. Because managing libraries is also painful.
Let's manage all of this auto process. Let's build a mesh of smart proxies that will be residing next to each one of our services. We'll be able to manage all of these proxies from a centralized location. In this way, we can provide a central solution to all of these distributed system reliability issues and concerns. That's basically what a service mesh is about. Now, the concept has developed largely thanks to Istio. The market leader in implementing the service mesh concept, the service mesh pattern today that was developed by Google in collaboration with Lyft and IBM.
That's basically how it works. Besides Istio, we also have Linkerd. We also have Console Connect right now. Now, what all of them basically do is they put smart proxies next to our services and provide us with a control plane that allows us to manage how these proxies are configured. We can configure them to define all these reliability techniques. To define which traffic is allowed. To define how the traffic is managed. That allows us also to provide security barriers for network communication between services. And that also allows us observability because through that control plane we can also pull data from all these proxies to understand what our services are chatting about.
Baruch:
We’re basically externalizing network communication and this allows us to see what's going on and to control what should speak with what.
Kat:
It's an abstraction layer on top of your distributed systems so that it seems saner. It's more human-readable instead of just being machine-readable.
Anton:
For the folks who are running Kubernetes, and today basically in 90% of cases, you deploy a service mesh over Kubernetes. On Kubernetes the data plane, the proxies are actually side cards that are injected into each one of your pods. The control plane are components that run somewhere in the cluster and allow to pull data from these proxies and to configure them
Kat:
Because it's just too much in my opinion, to expect that like every engineer ever is a Kubernetes expert or an expert in any other DevOps tool. You should be an expert in your thing, but you can't also be an expert at Kubernetes. You need something there to make that information still accessible and useful to you. But you don't necessarily need to know how to dive into the cluster and look at each microservice individually and figure out what's wrong with that network.
Anton:
Right. By the way, coming back to what we talked about earlier, there are also tools for the leading service meshes today that allow you to analyze the configuration on the service mesh and to find possible misconfigurations. Because again, let's say I have a hundred services, they're all communicating with one another. I need to define the security roles for this communication. Who is allowed to talk to whom and what kind of traffic is allowed and whatever. Something's misconfigured, very hard to understand, so we need some smarts in the system that will tell us this is where your problem is.
Kat:
Yeah, because a computer or a system or an application is only as smart as the person who built it or the person who programmed it. But computers and applications and systems like that don't get tired. They don't get bored and they don't get distracted and people do. So it's easier for us to make mistakes on repetitive things like security settings for a thousand different microservices. Because we get bored or distracted.
Anton:
In general. It's very easy for us to make mistakes.
Baruch:
Yeah. Okay, so I'm trying to picture this flow diagram for DevOps consultancy and I think I got a grip on it. So basically there are two types of complexity: a necessary one and a not necessary one. And for non-necessary, obviously, you try to convince people to simplify things, to simplify architectures, to maybe migrate from unnecessary microservices, et cetera. And for necessary ones, you try to suggest them having more control by introducing service measures as a solution for managing this complexity by taking control over the communication between the different components.
Anton:
That could be service meshes. That could be just having healthy API evolution practices so that whenever we are losing your version, we don't break the API for consumers. And that could be managed with the consumer-driven contract testing. There are techniques out there to manage the interactions between the competence of the system besides service measures just one of the examples.
Baruch:
So for API evolution, for example, do you have any methodology that you suggest or maybe some tooling around it? Or you just go and say like make sure that they are versioned? Is there something structured when you talk about the right API approach?
Anton:
Well, nothing new there. First of all, don't break APIs.
Baruch:
Easier said than done. That's a very broad rule that’s probably there are good exceptions to it.
Kate:
What would be a good exception when you should break an API?
Baruch:
I would say major versions you definitely should be allowed to break the APIs.
Anton:
Yeah. Then the same thing with the evolution of structured databases only add fields and don't remove fields.
Baruch:
But this is how you'll go straight into this complexity hell when you have too much of everything just because you keep adding and never really removed anything. How is this good?
Anton:
So then when you finally decide to break the API, remove all the redundant fields and make sure to communicate that to every user that you have out there. So basically, it all comes down in the end with all the rules regarding not breaking the API and how to advance the API and how to add fields and never remove fields. That's all about assuming what our users do. If we really want to know what our customers do. We need to have consumer-driven contracts in place.
Baruch:
What are those?
Anton:
Well, the concept itself was suggested by ThoughtWorks. I think it was Martin Fowler himself or one of the folks at ThoughtWorks. And there's tooling for that today on the market. There's a system called Pact, that basically what it allows you to do is, whenever you build an API, you create a test for that and whenever you call an API, you also create tests for that and you expert the definitions for those tests. So whenever I consume an API, I can expert the definition of whatever my service is expecting from the API and builds mocks. In the test I build mocks and these mocks define the behavior that I expect from my provider. And then somewhere in my CI, if my tests have passed, I take the test definition and I hand it over to the provider and the provider needs to make sure that these tests also run in their CD pipeline. So in this way the provider is responsible for the API not breaking.
Kate:
That's different. That's cool.
Baruch:
Sure, but this adds a lot of complexity. It is justifiable when you deal with external consumers who obviously have expectations of you providing consistent quality updates. For that, you obviously need a continuous updates pattern when we can trust each other because I need you to take my updates blindly and you need me not to break things. But microservices, I would say, is about having every microservice exposing API and then other microservice consuming their APIs, and if I start doing all that around each and every microservice, that's tons of overhead.
Kat:
I don't know. I think it's fine to add additional complexity if it gets you an equal or greater amount of consistency and reliability. But that's just me. I will spend a ton of time and effort making sure that as much as something is automated as possible on the front end so that I don't have to deal with it later. But that's me.
Anton:
Yeah, I agree with you Kat. Basically what they say is you can't really manage complexity with simplicity, so you will have to add complexity in order to manage complexity.
Baruch:
The question is that there are trade-offs for having microservices in the first place and I would say contract-based APIs and all that actually adds a lot of weight to the counter-argument of maybe I don't need them at all.
Anton:
Well at a certain scale you must have them because a monolith is great until it becomes huge. Because at certain points, there is this issue of gravity. So only last week somebody was telling me about how they're trying to transfer their monolith to Kubernetes without breaking it down into microservices. So they tried to build a Docker image, but they realized that actually right now their system is packaged as a virtual machine image. And when they tried to build that into a Docker image that they received a Docker image that weighs 125 gigabytes, because their system requires something like 500 RPMs to be installed.
Kat:
Holy crap dude.
Anton:
But that's a working monolith system. They have customers, they have paying customers, their system works. The only problem with it is it's very, very heavy, so it's hard to deliver. We all know this, that if you have a system that takes hours to build, that takes hours to package, that takes even tens of minutes to get started. There's a lot of time... that time that you spend just staring at your screen.
Kat:
Yeah, that's a huge waste of time, energy, money, manpower, nobody wants to be responsible for babysitting that. That's not being an engineer, that's babysitting.
Anton:
That's where the need for microservice arises. You want to make engineers' lives better.
Baruch:
What you are saying is that sometimes there is no escape from the complexity and you just have to accept it and try to deal with the tools that we have.
Anton:
Exactly. Pretending that there is no complexity is the worst thing that could happen. Actually there is a very nice documentary that I just watched. It's not a new one. It's from 2016 by Adam Curtis is one of my favorite documentary directors making documentaries on BBC. So his last one is called HyperNormalisation and it talks about how the world is becoming ever more complex and how politicians mainly try to make us believe that it's simple. And that they're in control and how that just never goes right. And we're actually witnessing this right now in the midst of something that nobody knows how to control.
Baruch:
Interesting. And that brings me back to what we started with. And that's your consultancy gig and your message that you need to convey to those who hire you. And at the end of the day, I think the message that you come with is not comforting. So basically you cannot say "People, I've got that. I have a silver bullet. I'm going to fix everything for you." That would be just irresponsible and wrong. Instead, you have to admit everything is hard. Everything is horrible. Everything is complicated. I don't have all the answers. And then what?
Anton:
Well, first of all, I can help you reduce the pain. Okay? And then I will be with you on that journey. I will accompany you on the journey and I will be there to hold your hand and together we can make it better. That's basically the messaging.
Kat:
Everything sucks, but you're not alone.
Anton:
Right. Actually one of the things that I, as a consultant, do is studying the industry and the right examples of these processes, making things better. There're examples of these transformations actually bringing gains, profits, happiness, success, whatever. We don't know if it's permanent but even on a temporary basis, if something works better today than it worked yesterday, then that's proof that there are techniques that can be used.
Baruch:
Okay. So with that kind of optimistic message, it will be a good time for us to wrap up.
Anton:
Thank you, folks. Thank you.
Kat:
Thank you. Bye.