ER Model and Relational Model
Entity-Relationship, relational schema, keys.
ER Model Fundamentals
An entity is a real-world object; an entity set is a collection of similar entities. Attribute types to memorize (mnemonic SCAM-D): Simple vs Composite (Name -> First, Last), Single-valued vs Multivalued (phone numbers, shown by double ellipse), Stored vs Derived (Age derived from DOB, shown by dashed ellipse), Key attribute (underlined). A weak entity has no key of its own and depends on an owner (identifying) entity; it is drawn with a double rectangle and connected via a double-diamond identifying relationship. Its discriminator (partial key) is underlined with a dashed line. NULL handling: attributes can be NULL when not applicable or unknown. Remember: composite = breakable into parts; multivalued = many values for one entity; derived = computable, never stored.
ER diagram symbols (memorize for quick recall): Rectangle = entity set; Double rectangle = weak entity; Ellipse = attribute; Double ellipse = multivalued attribute; Dashed ellipse = derived attribute; Underlined attribute = primary key; Diamond = relationship; Double diamond = identifying relationship (for weak entity); Line connecting = participation. Cardinality is written on lines as 1, N, M. Participation: single line = partial, double line = total. Min-max (look-across vs look-here) notation gives precise (min,max) constraints. Shortcut: a weak entity ALWAYS has total participation in its identifying relationship, so that connecting edge is always a double line.
Specialization is top-down (split a superclass into subclasses, e.g., Employee -> Engineer, Secretary). Generalization is bottom-up (combine entities into a superclass). Both use the ISA triangle and support attribute inheritance. Constraints: Disjoint (d) vs Overlapping (o) tells whether an entity can belong to multiple subclasses; Total vs Partial tells whether every superclass entity must belong to some subclass. Aggregation treats a relationship (with its participating entities) as a higher-level entity so it can participate in another relationship (modeling a relationship between an entity and a relationship). Memory aid: Specialization = Subclass creation (both start with S); Aggregation = treating a relationship as one Abstract block.
ER to Relational Mapping
Converting ER to tables (memorize by cardinality): 1:1 — add the primary key of one side as a foreign key on the other (prefer the side with total participation); can merge into one table. 1:N — put the primary key of the '1' side as a foreign key in the table of the 'N' side; no separate relationship table needed. M:N — ALWAYS create a separate relationship table whose primary key is the combination of the primary keys of both participating entities (plus relationship attributes). Shortcut: M:N forces a new table; 1:N never needs one; 1:1 optionally merges. Total participation on the N-side in a 1:N lets you add a NOT NULL constraint on that foreign key.
Weak entity set W with owner E: create a table for W including all of W's attributes PLUS the primary key of E as a foreign key. The primary key of W's table = {primary key of E} UNION {discriminator of W}. The identifying relationship needs no separate table. Multivalued attribute A of entity E: create a SEPARATE table with columns = {primary key of E, A}; its primary key is the full combination (PK of E, A) since one entity has many values. Composite attributes are flattened into their simple components (the composite name itself is dropped). Derived attributes are typically not stored. Memory aid: every multivalued attribute = its own new table.
GATE frequently asks the MINIMUM number of tables. Rule of thumb for two entities A, B with relationship R: 1:N or N:1 with the N-side having total participation -> minimum 2 tables (fold relationship into N-side). 1:1 with at least one side total -> minimum 1 table possible (merge both entities + relationship). M:N -> minimum 3 tables (A, B, and the relationship table) and this cannot be reduced. For 1:N where the N-side has PARTIAL participation, you still typically need only 2 tables (FK can be NULL). Always check participation: total participation is what enables merging tables to reduce the count.
Relational Model and Keys
A relation is a set of tuples; its schema is R(A1,...,An) with degree = number of attributes and cardinality = number of tuples. Key terms: domain (allowed values), tuple (row), attribute (column). A relation is a SET, so duplicate tuples are not allowed and tuple ordering is immaterial. Integrity constraints: (1) Domain constraint — values from the attribute's domain. (2) Key constraint — candidate key values are unique. (3) Entity integrity — no primary key attribute may be NULL. (4) Referential integrity — a foreign key value must either match some primary key value in the referenced relation or be entirely NULL. Memory aid: PK can't be NULL (entity integrity); FK can be NULL (referential integrity allows it).
Superkey: any attribute set that uniquely identifies a tuple. Candidate key: a MINIMAL superkey (no proper subset is a superkey). Primary key: one chosen candidate key (cannot be NULL). Alternate keys: candidate keys not chosen as primary. Foreign key: attribute(s) referencing another relation's primary key. Prime attribute: an attribute that is part of SOME candidate key; non-prime otherwise. Counting superkeys: if a relation with n attributes has a single candidate key of one attribute, the number of superkeys = 2^(n-1) (the key must be present, the other n-1 attributes are optional). Shortcut: every candidate key is a superkey, but not every superkey is a candidate key (minimality fails).
Relation R(A, B, C, D) where AB is the only candidate key. A superkey must contain the key AB; the remaining attributes C and D are each optional, giving 2^2 = 4 superkeys: {AB, ABC, ABD, ABCD}. If R(A,B,C,D,E) has two candidate keys A and BC: superkeys containing A = 2^4 = 16; superkeys containing BC = 2^3 = 8; superkeys containing BOTH A and BC = 2^2 = 4 (fix A, B, C; D, E optional). By inclusion-exclusion total = 16 + 8 - 4 = 20. Always use inclusion-exclusion when multiple candidate keys overlap or are disjoint to avoid double counting.
Relational Integrity and Schema Design
When a referenced (parent) tuple is deleted or its key updated, referential integrity must be maintained via one of: CASCADE (propagate the delete/update to child tuples), SET NULL (set the child's foreign key to NULL — requires FK be nullable), SET DEFAULT (set FK to a predefined default), and RESTRICT/NO ACTION (reject the operation if dependent child tuples exist). Default in most systems is NO ACTION/RESTRICT. Memory aid: CASCADE = follow the parent; SET NULL = orphan but flagged; RESTRICT = block. ON DELETE CASCADE can chain through multiple levels. Choosing CASCADE carelessly risks unintended mass deletions; RESTRICT is the safest conservative choice.
NULL means 'unknown' or 'not applicable', NOT zero or empty string. Key pitfalls for GATE: (1) NULL = NULL evaluates to UNKNOWN, not TRUE — use IS NULL. (2) Aggregate functions ignore NULLs (except COUNT(*) which counts all rows). (3) Three-valued logic: AND/OR/NOT over {TRUE, FALSE, UNKNOWN}. TRUE OR UNKNOWN = TRUE; FALSE AND UNKNOWN = FALSE; TRUE AND UNKNOWN = UNKNOWN. (4) A WHERE clause passes a row only if the condition is TRUE (UNKNOWN rows are dropped). (5) UNIQUE constraints generally allow multiple NULLs, but PRIMARY KEY allows none. Shortcut: any arithmetic with NULL yields NULL; any comparison with NULL yields UNKNOWN.
With T=TRUE, F=FALSE, U=UNKNOWN: AND -> T AND U = U, F AND U = F, U AND U = U. OR -> T OR U = T, F OR U = U, U OR U = U. NOT U = U. Quick rule: in AND, F dominates (F with anything = F); in OR, T dominates (T with anything = T); U appears only when no dominating value is present. For WHERE/HAVING/JOIN-ON, only rows evaluating to TRUE qualify; both FALSE and UNKNOWN rows are excluded. For CHECK constraints the opposite leniency applies: a CHECK passes unless it evaluates to FALSE (UNKNOWN is accepted). Remember this asymmetry: WHERE needs TRUE; CHECK only rejects FALSE.