We recently shared some excellent job advice from YouTuber and Developer Advocate James Quick.
But Corbin and James’ conversation didn’t end with that – they also covered some great information about the differences between SQL and NoSQL and how to choose the proper database for your particular use case.
The difference between relational and non-relational databases
Developers have traditionally stored data in the form of a series of tables that relate to one another through a combination of primary and foreign keys – also known as relational databases. The main way we interact with relational databases is through Structured Query Language (SQL).
For example, two relational database tables might look like this:
hero_table:
name | superpower |
---|---|
Flash | Really fast |
Storm | Controls weather |
villain_table:
name | superpower |
---|---|
Magneto | Great with magnets |
Bellatrix Lestrange | Powerful witch |
A SQL statement to merge two tables would look something like this.
SELECT name, superpower FROM hero_table
UNION
SELECT name, superpower FROM villain_table;
SQL and relational databases work great when you have structured and orderly data. However, as data capturing has increased on such a massive scale – think of the terabytes of real-time data captured on users cruising the internet every day – issues with scaling and tracking data that don’t have a consistent structure have become rampant.
Enter Not Only SQL (NoSQL) and non-relational databases. Instead of using tables consisting of rows and columns to store data, a non-relational database generally uses a JSON object stored with similar objects in a collection. All relationships between collections are informal.
So rather than having a bunch of different tables that need to link to one another, you can store all the information you need in the collection of objects. So our superheroes and villains data collection might look like this:
[
{
"name": "Flash",
"superpower": "really fast",
"type": "hero"
},
{
"name": "Storm",
"superpower": "Controls weather",
"type": "hero"
},
{
"name": "Magneto",
"superpower": "Great with magnets",
"type": "villain"
},
{
"name": "Bellatrix Lestrange",
"superpower": "Powerful witch",
"type": "villain"
}
]
The actual syntax of the NoSQL queries you make to a NoSQL database will depend on the database provider. For example, if you were using MongoDB, you’d use this query to list all the objects from your “supernatural characters” collection:
db.supernatural_characters.find()
So when should you use one and not the other?
SQL vs NoSQL? It depends on the data
As we mentioned earlier, SQL is great if you have lots of structured data that won’t have too many regular schema or column changes.
An excellent example of structured data is a database that tracks baseball statistics. Generally, you’ll always have a player table with a player name, team name, homeruns, at bats etc–as well as a number of calculated averages like batting average and earned run average. You’ll also probably have a team table with the team’s name, team home runs, location, etc.
There is a risk to having data structured like this, however – if you have a set of data that regularly has empty or null values for specific fields (e.g., a calculated field like batting average for a year when a player was out on injury), then you’re unnecessarily increasing your disk space, storage costs, and data retrieval time because the database has to store and recall those empty fields.
NoSQL objects can contain whatever you want –or don’t want – them to have. For that reason, NoSQL databases can be a bit more performant than SQL databases when it comes to reading data.
You may want to use NoSQL over SQL if you need to scale your data across many machines horizontally.
Horizontal scaling allows you to spread processing power across multiple machines, increasing the speed of your CRUD operations. Because of the way relational databases are designed, it’s challenging to get them spread across multiple processing nodes.
On the flip side, NoSQL databases can fall apart if a data access pattern changes since there is no way to enforce schemas and relationships like there is with SQL databases. Without carefully managing what data gets logged and where, then it's very easy to make a mess of the data.
For example, you could have a collection that tracks user demographics like their name, location, and age.
After three years the product pivots to track health data for users and stops tracking demographic data, so now the product is collecting height, weight, and birth date in the same user collection.
A month after that the product owner decides they want the birthday collected as a UTC string instead of a normal date string.
Now you have not only a collection with two different schemas (demographics and health info), but you have a field (birthday) that has two different data types. This can cause lots of processing issues and is one of the reasons you may want to have the schema-enforcement that SQL provides.
The Power of Object-Relational Mapping (ORM)
James dove into object-relational mapping (ORM) towards the end of the database discussion.
If you’re unfamiliar with ORM, it’s a way for developers to interact with both relational and non-relational databases that doesn’t involve generating complex SQL statements – or any SQL at all.
ORMs are an abstraction layer that sits above a database and takes human-readable function calls and translates them into SQL or NoSQL. This is a considerable advantage to developers as it reduces errors from having to write complex queries and the same function definitions can be used to manage multiple different databases.
Here’s what it would look like to insert a row into a relational database with Prisma, an ORM for JavaScript:
2 await prisma.user.create({
3 data: {
4 name: 'Alice',
5 email: 'alice@prisma.io',
6 posts: {
7 create: { title: 'Hello World' },
8 },
9 profile: {
10 create: { bio: 'I like turtles' },
11 },
12 },
13 })
You can find an ORM for just about every language; Hibernate is a popular one for Java, and Django ORM is often used with Python.
Serverless computing, horizontal scaling SQL with PlanetScale
James discusses more or his database knowledge with Corbin along with a brief introduction to serverless computing.
You can find those conversations in the Twitch replay video here.