Developer's Inner Conflicts, or How Over-engineering Increased Our Platform Expenses 10x Times

Juraj Malenica - Nov 4 '19 - - Dev Community

I've noticed that we developers have some psychological bugs - I see it in my colleagues, young and old, and in myself. Things that we know aren't rational, aren't "optimized for our success", and can get us in trouble. But we still do them for some reason.

I wanted to share my view of three such bugs through a story that happened not so long ago.

A little backstory

Projects at the company I work for revolve around building complex AI assistants. To make the creation and development of the assistants as easy as possible, we built a platform that enables you to mostly focus on the assistant's logic and behavior (think any web framework for web development). The platform consists of a dozen microservices, MongoDB, Kafka, Memcached, etc. Basically our components with stable and scalable third-party solutions.

We also decided to run our projects on AWS (even the ones in beta), to remove the need for handling servers. This meant we could use the excess time for something more productive.

Developing the platform

Looking back, it's very easy to now recognize the first psychological bug, once I've gained some experience by using the platform. But back then, things weren't so simple and straight-forward, hard decisions had to be made, and there was no time to spare.

Back then, while we were developing some database-related parts of the platform, came the question "What if an action on the database fails?" We didn't know what could happen, how or why, but we've become emotionally attached to the platform and there was this opportunity for things to go wrong that we just couldn't let by.

So, as would any developer that really cares about his project, we covered a lot of ground in the next few days:

  • data redundancy (multiple mongo instances)
  • atomic transactions
  • auto-reconnection to the Mongo client
  • strict database access capabilities
  • "helper functions" for uncommon tasks (e.g. updating an object in an array that is an object in some collection)

We needed to run three Mongo server instances to support all of this, which increased the complexity of the platform, which again, used more time... Months later, we decided to turn off most of the security measurements as they were only slowing us down and causing us problems.

Bug #1: over-engineering because of engineering

Over-engineering often comes when we're faced with complicated tasks, which almost all developers love solving. This is very satisfying for our ego and is almost always fun. Most often, though, it's not worth the time and effort for the company.

We can almost always argue that doing something will make this piece of code more stable, or increase the execution speed 10%. But when you're still trying to figure out how a car works, it doesn't really matter what color it is (at least not yet). To fight the over-engineering urge, check out the YAGNI pattern.

YAGNI

Dev-Ops on demand

While finishing up a project which used our now almost finished platform, we decided to set it up on AWS, with which we didn't have too much prior experience working with, as we used to set up everything on our servers. One of our first impressions was that it's very nice how AWS makes horizontal scaling fairly easy.

And having more instances of Memcached, Kafka, and Mongo is always better than running just one instance. If one instance fails others will take over, the component can handle more load and so on. Also, the changes we made on the database management as part of the 1st bug meant that we now had to have multiple instances of Mongo.

What basically happened was this:

Say

Here I just didn't want to counter our dev-ops although there was no clear reason for scaling up. At least the dev-ops guy had some fun.

Bug #2: never saying "no"

Fear of conflict is natural but sometimes necessary. Although, ideally, we always try to come to a solution together, so there is always a clear understanding of the logic behind our decisions.

We always make an effort to be open to new ideas, but the world is not black and white and that means hard decisions will have to be made. And oftentimes, the best thing we can do is consider "what minimal effort will produce maximum output?"

Scaling down

All this brought us in a situation where each project on the platform was spending lots of money because it used lots of resources. And with one project done, two in the pipeline, and four that we needed to start working on, that would amount to way too much.

We needed a better solution. But for what? And why? We've all been focused on pushing forward that no one communicated with other teams, talking about what's important and what's not. We needed to change that.

Once this became obvious, we did something that was needed much earlier - a meeting with all stakeholders of the platform and the projects, including the product team, dev-ops, and our CEO. Once we went through the big picture, hashed out our plans for the future of those projects and defined our requirements, we could finally move forward.

Bug #3: working in silos

Involving other people in the decision-making process is painful. It takes time to get to the same page, there can be disagreements, sometimes a decision even won't be made. So it's easier to just go with our gut feeling or educated guess.

Here we should always try to get back to the question "why" - why are we doing this? Why does this make sense? Why will this help us long-term? Doing that is hard, but other stakeholders can help.

Final result

To fix the spending issue, we had two main approaches: down-scaling the instances and resource sharing. We were far from having any performance problems so down-scaling was a sensible thing to do, and we decided to share resources (Kafka, Mongo, Memcached) until a project becomes profitable enough to stand on his own.

The final result was that we could deploy 10 projects for the same amount of money that we would need to deploy just one project before.

Conclusion

These three psychological bugs (over-engineering, avoiding NO, working in silos) are ones of many. We as developers are optimization maniacs, and I hope learning about these common mistakes will help you optimize better.

So remember:

  1. Do minimal effort for maximum results,
  2. It's ok to say no, and
  3. Two heads are better than one, talk to other people.

Good luck 🖖

. . . . . . . . . . . . . .