Normalization

1NF, 2NF, 3NF, BCNF, functional dependencies.

Functional dependencies

Armstrong's axioms, closure, minimal cover.

Database normalization — FDs, 1NF to BCNF
Notes

Why normalize?

  • Remove redundancy.
  • Avoid update/insert/delete anomalies.
  • Preserve dependencies.

FUNCTIONAL DEPENDENCY (FD)

X → Y means: value of X uniquely determines value of Y.

Armstrong's axioms:

  1. Reflexivity: if Y ⊆ X, then X → Y.
  2. Augmentation: if X → Y, then XZ → YZ.
  3. Transitivity: if X → Y and Y → Z, then X → Z.

Derived rules:

  • Union: X→Y, X→Z → X→YZ.
  • Decomposition: X→YZ → X→Y, X→Z.
  • Pseudo-transitivity: X→Y, WY→Z → WX→Z.

KEYS:

  • Super key: any set of attributes that uniquely identifies a tuple.
  • Candidate key: minimal super key.
  • Primary key: chosen candidate key.
  • Prime attribute: part of any candidate key. Non-prime: not in any CK.

NORMAL FORMS:

1NF (First Normal Form):

  • All attributes contain atomic (indivisible) values.
  • No repeating groups or arrays.
  • Bad: {StudentID, Name, Courses (CS101, CS102)}.
  • Good: separate row for each course.

2NF:

  • 1NF + no partial dependency.
  • Every non-prime attribute is fully dependent on the entire primary key (not just part).
  • Issue if PK is composite and some attribute depends only on part of it.

Example: PK = (StudentID, CourseID). Attributes: StudentName, CourseFee.
StudentName depends only on StudentID (partial) → split.

3NF:

  • 2NF + no transitive dependency.
  • Non-prime attribute should not depend on another non-prime.

Example: Employee(EmpID, DeptID, DeptName). EmpID → DeptID → DeptName. DeptName depends on EmpID transitively via DeptID.

Solution: EmpID → DeptID; DeptID → DeptName (separate table).

3NF rule: for every FD X → A, either:

  • X is a super key, OR
  • A is a prime attribute.

BCNF (Boyce-Codd Normal Form): stricter than 3NF.

  • For every FD X → A, X must be a super key. (No exception for prime attribute.)
  • All 3NF tables are BCNF if their FDs satisfy this. Most relations in 3NF are also BCNF; rare edge cases differ.

4NF and 5NF (advanced):

  • 4NF: no multi-valued dependencies (MVDs).
  • 5NF (PJ/NF): no join dependencies that aren't consequences of candidate keys.

EXAMPLE PROBLEM:

Relation R(A, B, C, D, E), FDs: A → BC, CD → E, B → D, E → A.

Find candidate keys.

  • Compute closures of single attributes:

    • A⁺ = ABCDE (via A→BC, B→D, CD→E). So A is a CK.
    • B⁺ = BD.
    • C⁺ = C.
    • D⁺ = D.
    • E⁺ = EABCD (via E→A, then A→BC, etc.). So E is a CK.
  • Check pairs:

    • CD⁺ = CDEABC = all → CD is a CK.
    • BC⁺ = BCDEA = all → BC is a CK.

Candidate keys: {A, E, CD, BC}.

Prime attributes: A, B, C, D, E (all are prime).

In this case, every attribute is prime → likely 3NF since prime rule satisfied.


EXAM HOOKS:

  • A relation is in BCNF if for every non-trivial FD, LHS is a super key.
  • 3NF allows X→A where A is prime (even if X is not super key).
  • Decomposition into BCNF may not preserve all FDs; 3NF decomposition always does.
  • Lossless decomposition: R = R₁ ∪ R₂ with R₁ ∩ R₂ being super key of at least one.

Normal forms

1NF, 2NF, 3NF, BCNF, anomalies.

Normal forms 1NF → BCNF — the practical decision tree
Notes

The normal forms are nested — BCNF ⊂ 3NF ⊂ 2NF ⊂ 1NF. Each removes a class of redundancy.

1NF: every attribute holds atomic values (no arrays, no nested tables). Almost always trivially true in modern relational DBs.

2NF: 1NF + no partial dependency (no non-prime attribute depends on only part of a candidate key). Only relevant if you have composite keys.

3NF: 2NF + no transitive dependency (no non-prime attribute depends on another non-prime attribute). For every FD X → Y in the relation, either X is a superkey OR Y is a prime attribute.

BCNF: 2NF + every non-trivial FD X → Y has X as a superkey. Stricter than 3NF — every determinant must be a superkey.

Quick test:

For every functional dependency X → Y in your schema:
  If X is a superkey → fine.
  Else if Y is a prime attribute (in some candidate key) → 3NF (but not BCNF).
  Else → not 3NF.

Worked example.
Schema: STUDENT(roll, course, instructor, dept).
FDs: roll → dept, course → instructor.
Candidate key: {roll, course}.

  • "roll → dept": roll is not a superkey, dept is non-prime → violates 3NF (transitive dependency).
  • "course → instructor": course is not a superkey, instructor is non-prime → also violates 3NF.

To get to 3NF: split into STUDENT(roll, dept), ENROLLMENT(roll, course), COURSE(course, instructor).

Why stop at BCNF? Higher forms (4NF, 5NF, DKNF) handle multi-valued dependencies and join dependencies, rarely needed for typical schemas.

Database normalization — FDs, 1NF to BCNF
Notes

Why normalize?

  • Remove redundancy.
  • Avoid update/insert/delete anomalies.
  • Preserve dependencies.

FUNCTIONAL DEPENDENCY (FD)

X → Y means: value of X uniquely determines value of Y.

Armstrong's axioms:

  1. Reflexivity: if Y ⊆ X, then X → Y.
  2. Augmentation: if X → Y, then XZ → YZ.
  3. Transitivity: if X → Y and Y → Z, then X → Z.

Derived rules:

  • Union: X→Y, X→Z → X→YZ.
  • Decomposition: X→YZ → X→Y, X→Z.
  • Pseudo-transitivity: X→Y, WY→Z → WX→Z.

KEYS:

  • Super key: any set of attributes that uniquely identifies a tuple.
  • Candidate key: minimal super key.
  • Primary key: chosen candidate key.
  • Prime attribute: part of any candidate key. Non-prime: not in any CK.

NORMAL FORMS:

1NF (First Normal Form):

  • All attributes contain atomic (indivisible) values.
  • No repeating groups or arrays.
  • Bad: {StudentID, Name, Courses (CS101, CS102)}.
  • Good: separate row for each course.

2NF:

  • 1NF + no partial dependency.
  • Every non-prime attribute is fully dependent on the entire primary key (not just part).
  • Issue if PK is composite and some attribute depends only on part of it.

Example: PK = (StudentID, CourseID). Attributes: StudentName, CourseFee.
StudentName depends only on StudentID (partial) → split.

3NF:

  • 2NF + no transitive dependency.
  • Non-prime attribute should not depend on another non-prime.

Example: Employee(EmpID, DeptID, DeptName). EmpID → DeptID → DeptName. DeptName depends on EmpID transitively via DeptID.

Solution: EmpID → DeptID; DeptID → DeptName (separate table).

3NF rule: for every FD X → A, either:

  • X is a super key, OR
  • A is a prime attribute.

BCNF (Boyce-Codd Normal Form): stricter than 3NF.

  • For every FD X → A, X must be a super key. (No exception for prime attribute.)
  • All 3NF tables are BCNF if their FDs satisfy this. Most relations in 3NF are also BCNF; rare edge cases differ.

4NF and 5NF (advanced):

  • 4NF: no multi-valued dependencies (MVDs).
  • 5NF (PJ/NF): no join dependencies that aren't consequences of candidate keys.

EXAMPLE PROBLEM:

Relation R(A, B, C, D, E), FDs: A → BC, CD → E, B → D, E → A.

Find candidate keys.

  • Compute closures of single attributes:

    • A⁺ = ABCDE (via A→BC, B→D, CD→E). So A is a CK.
    • B⁺ = BD.
    • C⁺ = C.
    • D⁺ = D.
    • E⁺ = EABCD (via E→A, then A→BC, etc.). So E is a CK.
  • Check pairs:

    • CD⁺ = CDEABC = all → CD is a CK.
    • BC⁺ = BCDEA = all → BC is a CK.

Candidate keys: {A, E, CD, BC}.

Prime attributes: A, B, C, D, E (all are prime).

In this case, every attribute is prime → likely 3NF since prime rule satisfied.


EXAM HOOKS:

  • A relation is in BCNF if for every non-trivial FD, LHS is a super key.
  • 3NF allows X→A where A is prime (even if X is not super key).
  • Decomposition into BCNF may not preserve all FDs; 3NF decomposition always does.
  • Lossless decomposition: R = R₁ ∪ R₂ with R₁ ∩ R₂ being super key of at least one.