11. Functional Dependencies & Normalization
Functional Dependencies & Normalization
Imagine a university that stores every piece of student information in one giant table: the student's name, their course, the course's lecturer, the lecturer's office, and the department. Now the lecturer moves offices. How many rows need updating? Potentially thousands — one for every student enrolled in every course that lecturer teaches. Miss even one row, and the database now contains two different office numbers for the same person. This is called an update anomaly, and it is the direct result of poor table design.
Normalization is the formal, mathematical process of restructuring tables to eliminate these problems. This module walks through every normal form, showing exactly what problem each one solves and how to fix it.
The Three Anomalies of Bad Design
Before learning the cure, understand exactly what you are curing. Poorly designed tables suffer from three types of anomalies — situations where ordinary operations produce incorrect or incomplete results.
An insertion anomaly occurs when you cannot record a piece of information without also recording unrelated information. If your table requires a student to exist before you can record a course, you cannot add a new course with no students enrolled yet. The table structure itself forces unwanted dependencies.
An update anomaly occurs when changing one fact requires updating the same value in multiple rows. If a lecturer's phone number appears on every row of every course they teach, changing their number means hunting down every row. One missed update leaves the database in an inconsistent state.
A deletion anomaly occurs when deleting a row inadvertently destroys unrelated information. If the last student drops a course and that deletes the row, you lose the course record itself — even though the course still exists.
All three anomalies have the same root cause: storing facts about multiple different subjects in a single table. Normalization forces each table to be about exactly one subject.
Functional Dependencies — The Mathematical Foundation
Normalization is built on a concept called a functional dependency. Column B is functionally dependent on column A when knowing the value of A always tells you exactly one value of B. Written formally: A → B, read as A determines B.
For example, in a Student table: knowing a student's ID tells you exactly one student name. Therefore StudentID → StudentName. This is a functional dependency.
But in a flattened enrollment table with columns StudentID, CourseID, CourseName, and InstructorName: CourseID → CourseName is a functional dependency, because every row with the same CourseID has the same CourseName. The course name is determined by the course, not by the student. This is the problem — CourseName does not belong in a table whose key is (StudentID, CourseID).
First Normal Form — 1NF
A table is in First Normal Form when every column holds exactly one atomic value — a single, indivisible piece of information — and there are no repeating column groups.
Consider a table where a student's phone numbers are stored in columns Phone1, Phone2, and Phone3. This violates 1NF because the structure assumes a maximum of three numbers and creates empty columns when a student has fewer. Searching for a specific number requires checking three separate columns.
The fix is straightforward: create a separate StudentPhone table with two columns — StudentID and PhoneNumber. Each phone number gets its own row. The original Student table is cleaner and the phone table is flexible — students can have any number of phone numbers.
Similarly, a column that stores a comma-separated list of values — like Maths, Physics, Chemistry in a single cell — violates 1NF. Lists belong in separate rows, not packed into a single column.
Second Normal Form — 2NF
A table is in Second Normal Form when it is already in 1NF and every non-key column is fully functionally dependent on the entire primary key — not just part of it.
2NF only applies to tables with a composite primary key — a key made from two or more columns. If the primary key is a single column, 2NF is automatically satisfied.
Consider an Enrollment table with primary key (StudentID, CourseID) and additional columns StudentName, CourseName, and Grade.
The Grade column is correctly placed here — it depends on both the student and the course together. A student can earn different grades in different courses.
But CourseName depends only on CourseID — not on StudentID at all. You could know the course name just from the CourseID alone. This is called a partial dependency and it violates 2NF.
The fix: remove CourseName from Enrollment and put it in its own Course table where CourseID is the primary key. Now CourseName is fully determined by its own table's key.
Third Normal Form — 3NF
A table is in Third Normal Form when it is in 2NF and contains no transitive dependencies — meaning no non-key column is determined by another non-key column.
Consider a Student table: StudentID, StudentName, DeptID, DeptName, DeptBuilding.
DeptID is not the primary key — StudentID is. But DeptName and DeptBuilding are determined by DeptID, not by StudentID. The chain is: StudentID → DeptID → DeptName. DeptName is reached through an intermediate non-key column. This transitive chain causes update anomalies — change a building name and you must update every student row in that department.
The fix is to extract DeptID, DeptName, and DeptBuilding into their own Department table. The Student table keeps only DeptID as a foreign key. Now the Student table contains only facts about students, and the Department table contains only facts about departments.
Applying the Rules in Practice
Take the following flat, denormalized table and trace the normalization process step by step.
Suppose you start with this single table:
-- Original flat table (violates 2NF and 3NF):
-- StudentID | StudentName | CourseID | CourseName | InstructorID | InstructorName | InstructorDept
Step one: identify the primary key. It is (StudentID, CourseID) because only the combination uniquely identifies an enrollment row.
Step two: test for 2NF. Which columns depend on only part of the key? CourseName depends only on CourseID. InstructorID depends only on CourseID. Extract these to a Course table:
CREATE TABLE Course (
CourseID INT PRIMARY KEY,
CourseName VARCHAR(100) NOT NULL,
InstructorID INT
);
Step three: test for 3NF in the remaining columns. InstructorName and InstructorDept depend on InstructorID, not on StudentID or CourseID. This is a transitive dependency. Extract them to an Instructor table:
CREATE TABLE Instructor (
InstructorID INT PRIMARY KEY,
InstructorName VARCHAR(100) NOT NULL,
Department VARCHAR(100)
);
The final Enrollment table keeps only the key columns and the one fact that genuinely depends on the full composite key — the grade:
CREATE TABLE Enrollment (
StudentID INT,
CourseID INT,
Grade DECIMAL(4,2),
PRIMARY KEY (StudentID, CourseID),
FOREIGN KEY (StudentID) REFERENCES Student(StudentID),
FOREIGN KEY (CourseID) REFERENCES Course(CourseID)
);
How to Use the Normalization Lab
The lab presents you with a flat, denormalized CSV export from a legacy university system containing seven columns mixed together: student_id, student_name, course_id, course_name, instructor_id, instructor_name, and instructor_department. Your task is to sort these columns into three normalized tables using a JSON editor.
Here is an example of the same task applied to a different scenario. Suppose you had a flat table with these columns: employee_id, employee_name, department_id, department_name, department_floor. Using the same JSON format the lab expects, you would map them like this:
{
"Employees": ["employee_id", "employee_name"],
"Departments": ["department_id", "department_name", "department_floor"],
"Assignments": ["employee_id", "department_id"]
}
Apply the same logic to the lab. Student facts go in the Students object. Instructor facts go in Instructors. The Enrollments object holds the bridge columns that connect students to courses, plus the course details that depend on the CourseID. When your JSON structure satisfies all three normal forms, the lab confirms your answer.
Data Cleansing Forge
3NF Normalization Mapping
The legacy university system exports all data into a single, flat CSV file. This creates massive insertion and update anomalies. You must restructure the columns into three normalized tables.
Primary Directive
- Remove transitive dependencies.
- Each table must be governed by its Primary Key.
- Course details and Student bridging belong in Enrollments.
Knowledge Check
Ready to test your understanding of 11. Functional Dependencies & Normalization?