Best practices sound like a great thing. Why wouldn’t we want to make sure our software is the best it can be? How better to make it so than to use practices that everyone considers the best? This was something I was extremely concerned with early in my career.
I’ve worked in a variety of situations: small companies to large companies, consumer products and B2B products, pure software companies and companies that sold physical things, new software to large mature systems. This helped me learn the hard way that while there are software practices that are good on average, it is important to always take into account the context that you are building software for.
Let’s look at microservices as an example. It’s become a common occurrence for a group of developers to criticize monolith systems. I’ve definitely been apart of those discussions and rarely defended monoliths in the past. Having 20-30 developers contribute code to a single codebase gets to be a pretty messy affair, especially when someone wants to modify the “core” code or update a library. Having each team maintain their own set of microservices instead can solve a lot of problems.
However, microservices have an immense amount of development overhead. By definition each one is going to be in its own repository. That makes things complicated if you have a feature that’s going to require modifying multiple microservices. That also complicates deployment because you have to make sure you deploy everything in the right order and that no backwards compatibility is broken. That’s not a problem in a monolith because all the code is deployed at once.
Microservices also have added infrastructure overhead because they all need their own monitoring and backups. Funny thing about these is that you tend to want to test these periodically as well to make sure they still work. The worst time to find out all your backups are corrupted is when you actually need them. More microservices means more time spent doing this testing.
For a growing company, microservices also complicate the onboarding process. A suite of microservices has a more complex setup for new developers than a monolith would (unless the monolith is really setup poorly). This can be mitigated with Docker, but then you have the overhead of having someone maintain the Docker setup whenever a new microservice is added or a system change is made to an existing microservice.
In the end, a single team of a handful of developers will probably do best in a monolith. With every new developer, the overhead of maintaining a monolith will eventually outweigh the overhead of microservices and a transition should be considered at that point. Building microservices before that point will hurt more than help.
Another example is tying data together for efficiency. In SQL databases, we can do this with JOIN queries. In NoSQL, we can do this by nesting more data in a document. There’s typically a lot of sense in this. Why make 2 or more queries to your database when you can just make 1? Optimizing in this way can help scale in the short to medium term and can be better for performance.
Unfortunately, multiple challenges with this start popping up in the long term. Let’s assume you hit a point where you felt it is a good idea to split your monolith into microservices. Having most of your data tied together limits your ability to do so. You end up in a situation where you have to analyze all your data to find an appropriate split only to find out there isn’t much that can be split out. In order to make the transition, a large amount of code has to be rewritten to uncouple the data.
It also makes it hard to scale, at least for SQL. SQL servers aren’t known for scaling horizontally and there is only so much a single server is going to be able to do. Having lots of JOINs makes it hard to split tables into their own databases since you can’t run a JOIN query across multiple servers. It also makes it hard to shard the database since you have to make sure all the data you need to join is on the same shard.
That being said, making sure all your data is independent has its own issues. Aside from having the performance overhead of making more queries, you also have to deal with more development overhead in managing all the independent data sources.
There isn’t a right answer here. Doing what’s “best” is dependent on what kind of software you’re building and who you’re building it for. There are trade-offs with every approach. You can’t simply pick something because everyone else considers it a best practice. Your unique situation has to be accounted for. Every design decision has to have a real justification, not just because it is so.