About this article

In this article, we use one of the entities from our case study and perform a database normalization example. We start with the products table, unnormalized, and progress through the first, second, and third normal forms. Finally, we show the results of applying the same techniques to the entire Entity Relationship Diagram for our online store.

This post is the fourth on a series exploring the subject of relational databases for the full-stack developer.

For quick access, here is the list of the articles on the topic:

  1. Introduction to Relational Databases
  2. Database Design: Entities, Attributes, and Relationships
  3. Database Design: Entity-Relationship Diagrams
  4. Database Design: Normalization (this article)
  5. Database Design: Entity-Relationship Diagram to Structured Query Language
  6. Deploying PostgreSQL for development and testing
  7. Structured Query Language Cheat Sheet
  8. Working with databases from Python

Introduction

Database normalization is a process of structuring a relational database in a way that reduces data redundancy and improves data integrity.

Data redundancy is the repetition of the same data in different places in your database. Avoiding repetition is important because repetition may cause anomalies when inserting, deleting, and updating data.

You achieve normalization by applying successive normal forms to your database design. Wikipedia lists ten of these forms. In most cases, however, using the first, second, and third normal forms is enough to produce a normalized database.

Table with normal forms and their requirements.  Source: wikipedia.

Normal forms and their requirements

You should perform normalization in order. In other words, your tables must comply with the first normal form before attempting to apply the second normal form, and so on.

As it is often the case with abstract concepts, database normalization is better explained and understood by following an example.

Unnormalized table

Let’s take the products entity from the last post and create a table with it:

This product table is in unnormalized from

Unnormalized products table

First normal form (1NF)

To achieve 1NF, your tables must meet the following criteria:

  • Table columns must contain atomic values.
  • The table must not have any repeating groups.
  • Every row in the table must be identifiable by a primary key.

Atomic values

In the context of database normalization, your columns have atomic values if every value in the column is a single, indivisible value.

In the unnormalized products table above, the product_color column for the iPhone XS cover stores two values: Chili Red and Steel Blue. Therefore, product_color violates the atomicity requirement. You could modify the table to store the colors in different columns as follows:

To avoid breaking the atomicity requirement, we separated the colors in two columns but now we end up with a repeating group.

Products table after separating each color value into its own column

Repeating groups

A table has repeating groups when several of its columns store the same type of value. In the last version of your products table, we have two columns for storing product_color and therefore you have a repeating group: product_color.

To fix the repeating group issue, you have to separate each product color into its own row. The resulting table is now:

By repeating the product item, once for each column we end up with the table in first normal form.

Product table after eliminating the repeating groups

Primary key

Now, let’s look at the primary key. In the previous post, you added an auto-incrementing id attribute to all entities for the primary key. If you look at the last iteration of your products table, however, the id attribute no longer works as a primary key. The first two rows in the table have the same id but refer to different products: an iPhone XS cover chili red and an iPhone XS cover steel blue.

For this table, we need a composite key: all products in the table can be identified by a combination of id and product_color. You can say that the products table meets 1NF.

Data redundancy

There is an issue with data redundancy that you might as well fix now, before moving on to the second normal form.

If the store introduces a new product, say a Pouch, and it comes in Chili Red, Steel Blue, and Ink, your product table becomes:

Product table in 1NF after adding the item Pouch in three different colors.

Product table in 1NF after adding the item Pouch in three different colors

Notice the problem with data redundancy. If you, for marketing reasons, decide to change the color name from Chili Red to Sunset Red, you must find all uses of the original color name in the product table and update them to the new color name.

Performing these updates is not only more work than necessary, but it can also lead to inconsistent data states if some instances are updated, and others are not.

You can avoid this pitfall by separating the product_colors into a new table and creating a relationship between products and product_colors:

Products table after extracting product_colors to a separate table and relating both tables with a foreign key.

Products table after extracting product_colors to a separate table and relating both tables with a foreign key

New product_colors table

New product_colors table

Now to change Chili Red to Sunset Red, you only need to update the first record in the product_colors table, and the data remains consistent.

The new composite primary key for the products table is id and product_color_id.

Second normal form (2NF)

Once your table complies with 1NF, you can satisfy 2NF by eliminating any partial dependencies.

Partial dependencies

To have partial dependencies in a table, you must have a composite primary key and have columns that do not depend on the entire composite key.

The last iteration of your products table has a composite key: id and product_color_id. However, the columns name, cost, and price depend only on id. The column quantity depends on the entire composite key.

You can see how name, cost, and price are partially dependent on the primary key. You must get rid of all partial dependencies to satisfy 2NF.

To solve this problem, you need to create a new associative table to relate products and product_colors and move the quantity column to the new table. The result follows:

Products table in 2NF.

Products table in 2NF

New products_product_colors table in 2NF.

New products_product_colors table in 2NF

New products_product_colors table in 2NF.

Product_colors table in 2NF

You can see now that all columns in the products table depend on the primary key id. Furthermore, all columns in the new products_product_colors table depend on the composite primary key product_id and product_color_id.

Third normal form (3NF)

To satisfy 3NF, your tables must be in 2NF, and you must eliminate any transitive dependencies.

Transitive dependencies

A transitive dependency exists in a table when a column’s dependence in the primary key happens indirectly through another column.

In our example, there are no transitive dependencies, and, therefore, the tables already meet the requirements for 3NF.

For the sake of completeness, we are going to add to the products table two columns, supplier and county_of_origin, to illustrate what transitive dependencies are and how to avoid them.

Let’s say you have two suppliers: ACME from China and Globex from India. The products table with the modifications follows:

Products table with transitive dependency country_of_origin.

Products table with transitive dependency country_of_origin

You can easily see that country_of_origin is indirectly dependent on id though the supplier column, and therefore it creates a transitive dependency.

As usual, you fix the problem by moving supplier and country_of_origin to a new table and creating a relationship between the new table and the products table.

The result follows, and your products table is now in 3NF.

Products table in 3NF.

Products table in 3NF

This image has an empty alt attribute; its file name is suppliers-1.png

Normalization for the entire Entity Relationship Diagram

Now you should review all entities and relationships in the diagram for your online store and modify them as needed to achieve 3NF. Your resulting diagram should be similar to this one:

Normalized entity-relationship diagram for Atelier-M online store.

Normalized entity-relationship diagram for our online store

Conclusion

In this article, you were exposed to database normalization by following a simple example. Then you had the chance to apply your knowledge by normalizing the entity-relationship diagram for your online store.

In the process, you learned about the different normal forms and about important concepts like atomic values, repeating groups, partial dependencies, and transitive dependencies.

In the next article, you will translate the normalized entity-relationship diagram into Structured Query Language statements.

References

You may also like:

Show CommentsClose Comments

Questions? Comments? Leave a reply to start a conversation...

Discover more from Wander In Dev

Subscribe now to keep reading and get access to the full archive.

Continue reading