Effective Database Design: Part 3

Adam McNeilly - Dec 5 '18 - - Dev Community

This is the third post in a series about database design and normalization. In our last post, we discussed the first normal form and took an initial step toward proper database design. We learned that we should have a primary key, atomic values, and no repeating groups. We also learned how to separate our entities to avoid redundancy.

Now we can go a step further and make sure we abide by the second normal form.

Second Normal Form

A table is in 2NF if it is in 1NF, and there are no partial dependencies. This means that non-prime columns (columns that are not part of the key, or possible keys) depend entirely on the primary key.

In a table that has a single primary key column, this will always hold true, so we don't need to stress about it. If we have a composite key, though, we could break this rule.

For example, let's add a teacher column to our student_classes table from the last part. Let's say that Ms. Smith teaches math, Ms. Jackson teaches programming, and Ms. James teaches history.

student_id class_id teacher
1 1 Smith
1 2 Jackson
2 3 James

The teacher column is determined only by the class_id, not by the entire primary identifier of (student_id, class_id).

Problem

Before we discuss a solution, let's look at the problems this could cause. Consider if Prince also took programming:

student_id class_id teacher
1 1 Smith
1 2 Jackson
2 3 James
2 2 Jackson

Notice any familiar problems? We have data redundancy issues. Twice we see that Ms. Jackson teaches programming, which also indicates we could be at risk of an update anomaly if I only changed one of those rows.

Solution

Since the teacher column is dependent only on the class_id column, we should group those entities together.

Let's put the teacher column in the classes table:

id class_name teacher
1 Math Smith
2 Programming Jackson
3 History James

We've got a lot to keep track of in our three tables so far, so we'll just look at a high level diagram. Shout out to dbdiagram.io for making a helpful tool:

The 2NF was a quick one to explain, but now we've got a pretty well designed database. Let's recap everything up to the 2NF:

  • Each table has a primary identifier.
  • Each column only has atomic values.
  • No tables have repeating groups.
  • There are no partial dependencies.

We're in a good spot now that we know how to ensure these four things. In the next post, we'll continue on this normalization adventure to comply with the third normal form.

. . . . . . . . . . . . . . . . . . . . . . . .