About this article
In this article, we use one of the entities from our case study and perform a database normalization example. We start with the products table, unnormalized, and progress through the first, second, and third normal forms. Finally, we show the results of applying the same techniques to the entire Entity Relationship Diagram for our online store.
This post is the fourth on a series exploring the subject of relational databases for the full-stack developer.
For quick access, here is the list of the articles on the topic:
- Introduction to Relational Databases
- Database Design: Entities, Attributes, and Relationships
- Database Design: Entity-Relationship Diagrams
- Database Design: Normalization (this article)
- Database Design: Entity-Relationship Diagram to Structured Query Language
- Deploying PostgreSQL for development and testing
- Structured Query Language Cheat Sheet
- Working with databases from Python
Database normalization is a process of structuring a relational database in a way that reduces data redundancy and improves data integrity.
Data redundancy is the repetition of the same data in different places in your database. Avoiding repetition is important because repetition may cause anomalies when inserting, deleting, and updating data.
You achieve normalization by applying successive normal forms to your database design. Wikipedia lists ten of these forms. In most cases, however, using the first, second, and third normal forms is enough to produce a normalized database.
You should perform normalization in order. In other words, your tables must comply with the first normal form before attempting to apply the second normal form, and so on.
As it is often the case with abstract concepts, database normalization is better explained and understood by following an example.
Let’s take the products entity from the last post and create a table with it:
First normal form (1NF)
To achieve 1NF, your tables must meet the following criteria:
- Table columns must contain atomic values.
- The table must not have any repeating groups.
- Every row in the table must be identifiable by a primary key.
In the context of database normalization, your columns have atomic values if every value in the column is a single, indivisible value.
In the unnormalized products table above, the product_color column for the iPhone XS cover stores two values: Chili Red and Steel Blue. Therefore, product_color violates the atomicity requirement. You could modify the table to store the colors in different columns as follows:
A table has repeating groups when several of its columns store the same type of value. In the last version of your products table, we have two columns for storing product_color and therefore you have a repeating group: product_color.
To fix the repeating group issue, you have to separate each product color into its own row. The resulting table is now:
Now, let’s look at the primary key. In the previous post, you added an auto-incrementing id attribute to all entities for the primary key. If you look at the last iteration of your products table, however, the id attribute no longer works as a primary key. The first two rows in the table have the same id but refer to different products: an iPhone XS cover chili red and an iPhone XS cover steel blue.
For this table, we need a composite key: all products in the table can be identified by a combination of id and product_color. You can say that the products table meets 1NF.
There is an issue with data redundancy that you might as well fix now, before moving on to the second normal form.
If the store introduces a new product, say a Pouch, and it comes in Chili Red, Steel Blue, and Ink, your product table becomes:
Notice the problem with data redundancy. If you, for marketing reasons, decide to change the color name from Chili Red to Sunset Red, you must find all uses of the original color name in the product table and update them to the new color name.
Performing these updates is not only more work than necessary, but it can also lead to inconsistent data states if some instances are updated, and others are not.
You can avoid this pitfall by separating the product_colors into a new table and creating a relationship between products and product_colors:
Now to change Chili Red to Sunset Red, you only need to update the first record in the product_colors table, and the data remains consistent.
The new composite primary key for the products table is id and product_color_id.
Second normal form (2NF)
To have partial dependencies in a table, you must have a composite primary key and have columns that do not depend on the entire composite key.
The last iteration of your products table has a composite key: id and product_color_id. However, the columns name, cost, and price depend only on id. The column quantity depends on the entire composite key.
You can see how name, cost, and price are partially dependent on the primary key. You must get rid of all partial dependencies to satisfy 2NF.
To solve this problem, you need to create a new associative table to relate products and product_colors and move the quantity column to the new table. The result follows:
You can see now that all columns in the products table depend on the primary key id. Furthermore, all columns in the new products_product_colors table depend on the composite primary key product_id and product_color_id.
Third normal form (3NF)
A transitive dependency exists in a table when a column’s dependence in the primary key happens indirectly through another column.
In our example, there are no transitive dependencies, and, therefore, the tables already meet the requirements for 3NF.
For the sake of completeness, we are going to add to the products table two columns, supplier and county_of_origin, to illustrate what transitive dependencies are and how to avoid them.
Let’s say you have two suppliers: ACME from China and Globex from India. The products table with the modifications follows:
You can easily see that country_of_origin is indirectly dependent on id though the supplier column, and therefore it creates a transitive dependency.
As usual, you fix the problem by moving supplier and country_of_origin to a new table and creating a relationship between the new table and the products table.
The result follows, and your products table is now in 3NF.
Products table in 3NF
Normalization for the entire Entity Relationship Diagram
Now you should review all entities and relationships in the diagram for your online store and modify them as needed to achieve 3NF. Your resulting diagram should be similar to this one:
In this article, you were exposed to database normalization by following a simple example. Then you had the chance to apply your knowledge by normalizing the entity-relationship diagram for your online store.
In the process, you learned about the different normal forms and about important concepts like atomic values, repeating groups, partial dependencies, and transitive dependencies.
In the next article, you will translate the normalized entity-relationship diagram into Structured Query Language statements.
- Database normalization
- First normal form
- Second normal form
- Third normal form
- Data anomalies
- Database atomic values
- What does “repeating groups” mean?
- Partial dependency (database)
- What is transitive dependency in a database
You may also like: