Effective Database Design: Part 4

Adam McNeilly - Dec 6 '18 - - Dev Community

This is the fourth and likely final post in a series about database design and normalization. In our last post, we learned about the second normal form. Up to this point, we have four things to keep in mind when designing our database:

  • Each table has a primary identifier.
  • Each column only has atomic values.
  • No tables have repeating groups.
  • There are no partial dependencies.

To recap, we currently have a database that looks like this:

Let's continue adding on to this database. As with the last posts, I'll explain the third normal form (3NF), show a design that breaks it, and show you how to correct it.

Third Normal Form

A table is in 3NF if it is in 2NF, and has no transitive dependencies.

A transitive dependency is when the following happens:

  1. A determines B
  2. B does not determine A
  3. B determines C

If that didn't stick, let's show an example. Here we are going to revisit our classes table, and add the teacher's office:

id class_name teacher office
1 Math Smith A107
2 Programming Jackson B205
3 History James A100

We can consider our key to be (id, class_name) because the combination of those will always be unique. The transitive dependency here is the following:

  1. (id, class_name) determines the teacher.
  2. The teacher does not determine the (id, class_name). This is because a teacher could teach multiple classes.
  3. The teacher determines the office.

Therefore, we have a transitive dependency from the (id, class_name) to the office.

Problem

Why are transitive dependencies bad? Well, I mentioned teachers could teach multiple classes. Let's see what our table looks like when that happens:

id class_name teacher office
1 Math Smith A107
2 Programming Jackson B205
3 History James A100
4 Science Jackson B205

If you've followed along with the other posts, you'll recognize the same problem again. We've got data redundancy, which can lead to a lack of integrity as a result of data anomalies. I could modify this table such that Ms. Jackson has two different offices, and I don't know which is accurate.

Solution

Once you find a dependency like this, you should isolate those related columns (teacher, office) and move them into a separate table. It's pretty likely that they represent a different entity, anyways - here, we're conflating teachers and classes together. Let's separate them.

We can have a teachers table:

id teacher_name office
1 Smith A107
2 Jackson B205
3 James A100

And our updated classes table:

id class_name teacher_id
1 Math 1
2 Programming 2
3 History 3
4 Science 2

Now, after all of this effort, we have a database that meets 3NF:

Congrats! This was a lot to take in throughout all four posts, but you learned several essential criteria for a well designed database:

  • Each table has a primary identifier.
  • Each column only has atomic values.
  • No tables have repeating groups.
  • There are no partial dependencies.
  • There are no transitive dependencies.

There are even more normal forms, providing things to keep an eye on as your databases grow larger, but the five bullets above are an excellent starting point, and as much as I'm going to cover for now.

I hope you enjoyed this series, and please drop any questions for me in the comments and I will do my best to answer and update the posts as necessary. If you found these bite size examples of normalization easy to follow, let me know and I'll try to keep following up. :)

. . . . . . . . . . . . . . . . . . . . . . . .