Normalization
Normalization has the following functions:
- Evaluate and correct table structures to minimize data redundancies.
- Reduce data anomalies.
- Assign attributes to tables based on determination.
There are three normal forms:
- First Normal Form (1NF)
- Second Normal Form (2NF)
- Third Normal Form (3NF)
| Normal Forms | Characteristic |
|---|---|
| First Normal Form (1NF) | Table format, no repeating groups, and primary key identified. |
| Second Normal Form (2NF) | 1NF and no partial dependencies. |
| Third Normal Form (3NF) | 2NF and no transitive dependencies. |
| Boyce-Codd Normal Form (BCNF) | Every determinant is a candidate key (special case of 3NF) |
| Fourth Normal Form (4NF) | 3NF and no independent multivalued dependencies. |
Reasons of Normalization
The database design must be efficient on performance, which should be free of update, insertion and deletion anomalies.
There are three anomalies to avoid:
| Anomaly | Effect |
|---|---|
| Insertion Anomaly | Adding new rows forces user to create duplicate data. |
| Deletion Anomaly | Deleting rows may cause a loss of data that would be needed for other future rows. |
| Modification Anomaly | Change data in a row forces changes to other rows because of duplication. |
First Normal Form (1NF)
- It should only have single (or atomic) valued attributes / columns (not a list).
- All columns in a table should have unique names (No duplicate row).
- Attributes domain should not be change.
Second Normal Form (2NF)
- It should be in the First Normal Form.
- It should not have partial dependencies.
- All columns depend on one primary key in one table (functional dependency).
Partial dependency means that, if there is a composite primary key in a table, and one column in this table only depends on one of the primary keys, it is called partial dependency.
For example, there is a table with a composite primary key (id, name), but there is a column named person only depends on id, then the partial dependency occurs.
Third Normal Form (3NF)
- It is the first and second normal form.
- It does not have transitive dependency.
- One attribute depends on one attribute which is not primary key, can not take
NULLvalues.
A Transitive dependency in a database is an indirect relationship between values in the same table that causes a functional dependency.
In another word, transitive dependency happens when a non-key attribute determines another non-key attribute.
For example, there are three columns having this relation:
graph LR A --> B --> C
Where B depends on A, and C depends on B, which is transitive dependency of A.
To fix this problem, these columns should be separated into two tables to meet the requirement of 3NF.