As a dev, you've most likely encountered the database migration issue, when you've a working production environment and you try to run the migration command from development or staging to production and it cause a conflict in your production database because you've altered the table in your development and now its not able to sync those changes properly.
Database Administrators and Engineers at big tech companies and almost every other business have experienced this issue, many tried to build their own solutions and few of them published their solutions as an open source project, and the terms used for this solution are branching and online schema changes.
Let's examine the problem to see what's going on 🧤
Merging the individual work of a big team of engineers into one database is difficult. Conflicts can become more difficult to resolve when simultaneous changes are made to multiple sections of code that are dependent on each other.
For example, if one developer changes the schema of a table by adding a new column, and another developer attempts to insert data into the table using the old schema, a migration conflict will occur. In this case a queue needs to be implemented in the migration workflow and the common scenario is to use a branch to separate production databases from development.
Online schema changes are a technique that allows database administrators to alter the structure of a database without taking it offline. These changes can include adding or modifying columns in a table, altering the data type of a column, or renaming a table.
The main problem that online schema changes are trying to solve is the downtime that is typically required when making structural changes to a database. In traditional database management systems, making changes to the schema of a database often requires taking the database offline, which can cause disruption to applications and services that rely on the database. Online schema changes allow database administrators to make these changes without taking the database offline, which minimizes downtime and makes it easier to maintain and evolve the database over time.
How severe is the problem? 🤔
Tesla performs upto 130 deployments per day. Around a million electric vehicles are currently roaming around on the roads. A single conflict while merging databases to production can cause a hefty loss to the company. Similar could happen to other companies including Instagram which deploys 30-50 times a day.
Process of Branching 🛣
Branching can be accomplished in just a few steps.
- Create a separate branch from your production branch to use for development purposes. This branch will be a copy of your production schema that you can modify without affecting the production environment.
- Modify the structure of this copy of your database by making changes such as removing a column or adding a new table.
- Test the changes you made to the database structure in a development environment to ensure that they are functioning correctly.
- Create a comparison between the production and development schemas. This will allow you to review and understand the changes that will be implemented, and the issues with the schema, such as missing unique keys, that need to be addressed before the deployment happens.
- Now to deploy development to production in zero downtime, you need to carry out the process of online schema changes.
Process of Online Schema Changes 🐣
Online Schema Changes follows a pretty simple pattern.
- Make a new table that is identical to the development table, but without any data. We'll call this the "ghost" table.
- Modify the ghost table through an ALTER statement without any additional cost, as it is empty.
- Initiate a lengthy process of transferring rows from the production table to the ghost table, doing so in small increments.
- Once the copying of the existing data is finished, the migration is typically ready to switch over to the new system.
- The final step in the process is the cut-over, which involves renaming the production table and replacing it with the ghost table. This may cause a brief outage, but in general, users and applications will not be aware that the table has been swapped out from under them.
Which products are tackling this issue? ⛑
Several products are trying to solve this problem and in doing so they are making the process as simple as possible so you don't have to implement branching and online schema changes on your own.
Cockroach Labs provides a free forever plan but has some limitations related to online schema changes.
YugabyteDB has a free tier only for educational and non-production use. It has multiple unsafe DDL operations in both of its query languages, YSQL and YCQL.
PlanetScale also has a free-forever tier, and it just have two limitations, FOREIGN KEY constraints and RENAME for columns and tables, which can be easily solved through their easy to follow tutorials, FOREIGN KEY solved and RENAME solved, so with these solutions they completely eliminated all of their drawbacks.
Conclusion ✨
PlanetScale not only addresses issues with branching and online schema changes, but also offers a range of unique features that are not available in other products and services. Check out all the features in the documentation. Don't wait too long, create your account and get started with PlanetScale. In case you have some questions regarding the article or want to discuss something feel free to connect with me on LinkedIn 💖
If you work at a startup and want me to write for your product please do connect with me 🥰