Memory Management

Paging, segmentation, virtual memory, page replacement.

Practice — 2 questions Ask AI

Paging

Page table, TLB, fragmentation.

Computer organization — pipelining, cache hierarchy, memory speedups

Notes

Instruction pipelining — overlap execution of multiple instructions to improve throughput.

Classic 5-stage pipeline:

IF (Instruction Fetch)
ID (Instruction Decode) + register read
EX (Execute) in ALU
MEM (Memory access)
WB (Write Back)

Without pipelining: 5 cycles per instruction → 1/5 instruction per cycle.
With pipelining (steady state): 1 instruction per cycle. 5× speedup in ideal case.

Pipeline hazards:

1. Structural hazard: two stages competing for the same resource (e.g., memory unit serving both IF and MEM). Fix: duplicate (separate I-cache and D-cache).

2. Data hazard: instruction depends on result of preceding one not yet computed.

RAW (Read After Write): most common. ADD r1, r2, r3 followed by SUB r4, r1, r5 — SUB reads r1 before ADD wrote it.
Mitigations: forwarding (bypass result from EX/MEM to next ALU input), stalling (insert bubbles), compiler reordering.

3. Control hazard: caused by branches.

Wait until branch is resolved (stall ~3 cycles) — bad.
Branch prediction: static (always taken / always not) or dynamic (history-based, e.g., 2-bit saturating counter, branch target buffer). Modern CPUs achieve >95% accuracy.

Speedup formula: S = (Time without pipelining) / (Time with pipelining + stalls).

MEMORY HIERARCHY

Level	Size	Speed	Cost/byte
Registers (CPU)	~kBs	< 1 ns	Highest
L1 cache	tens of KB	1-2 ns	High
L2 cache	hundreds of KB	3-10 ns
L3 cache	MBs	10-30 ns
Main memory (DRAM)	GBs	50-100 ns
SSD / Disk	TBs	0.1-10 ms	Low

Principle of locality justifies caches:

Temporal locality: recently-used items used again soon.
Spatial locality: items near recently-used items likely to be used.

CACHE BASICS

When CPU requests data, check cache first.

Hit: data found in cache. Fast.
Miss: must fetch from lower level. Slow.

Average memory access time (AMAT):
AMAT = Hit time + Miss rate × Miss penalty.

Example: Hit time = 1 ns, hit rate = 0.95, miss penalty = 100 ns.
AMAT = 1 + 0.05 × 100 = 6 ns.

Even a 5% miss rate doubles average access time. Hence the obsession with high hit rates.

CACHE MAPPING POLICIES — where does a memory block go in cache?

1. Direct mapped: block goes to one specific cache line. Simple but conflict misses when multiple blocks map to same line.

2. Fully associative: block can go anywhere. No conflicts but slow lookup (must compare all entries).

3. Set associative: k-way; block goes to one of k lines in a set. Compromise between simplicity and flexibility. Common: 4-way, 8-way.

Address breakdown:

Tag (uniquely identifies block).
Index (which set).
Offset (byte within block).

REPLACEMENT POLICIES (when cache is full):

LRU (Least Recently Used): evict least recently accessed. Good but expensive to track.
FIFO: simple but suffers Belady's anomaly.
Random: surprisingly competitive; simple.
MRU, NRU, Pseudo-LRU for approximations.

WRITE POLICIES:

1. Write-through: write to cache AND main memory simultaneously. Simple, consistent. Slow (bottlenecked by memory).

2. Write-back: write only to cache; mark "dirty"; flush to memory when evicted. Fast but consistency overhead.

Combined with:

Write-allocate: on write miss, load block into cache first.
Write-no-allocate: on write miss, write directly to memory.

Typical pairings: write-back + write-allocate, OR write-through + no-allocate.

Worked example. A direct-mapped cache has 64 lines, each 16 bytes. Memory address is 32 bits. Find tag, index, offset fields.

16 bytes → 4 bits offset.
64 lines → 6 bits index.
Tag = 32 − 6 − 4 = 22 bits.

For a 32-bit address ABCDEF12 (hex):

Offset = last 4 bits = 0010 = 2.
Index = next 6 bits.
Tag = upper 22 bits.

Page replacement

FIFO, LRU, optimal, second-chance.

Page replacement — FIFO, LRU, Optimal, and Belady's anomaly

Notes

When physical memory is full and a new page is referenced, the OS must evict an existing page. Page replacement algorithms decide which one.

Three classics:

1. FIFO (First-In, First-Out). Evict the page that has been in memory longest.

Simple to implement (queue).
Suffers from Belady's anomaly: more frames can paradoxically cause MORE faults.

2. LRU (Least Recently Used). Evict the page that hasn't been used for the longest time.

Closer to optimal in practice.
Implementation: stack/counter (expensive) or approximation via "use bits".
Does NOT suffer Belady's anomaly (it's a "stack algorithm").

3. Optimal (OPT, Belady's). Evict the page that won't be needed for the longest time in the future.

Provably optimal — fewest faults.
Requires future knowledge → unimplementable in real OS, but used as a benchmark.

Worked example. Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5. With 3 frames:

FIFO: 1*, 2*, 3*, 4* (replaces 1), 1* (replaces 2), 2* (replaces 3), 5* (replaces 4), 1, 2, 3* (replaces 5), 4* (replaces 1), 5* (replaces 2). 9 faults.

LRU: 1*, 2*, 3*, 4* (LRU=1), 1* (LRU=2), 2* (LRU=3), 5* (LRU=4), 1, 2, 3* (LRU=5), 4* (LRU=1), 5* (LRU=2). 10 faults.

OPT: 1*, 2*, 3*, 4* (replaces 3 — not used soon), 1, 2, 5* (replaces 4), 1, 2, 3* (replaces 5), 4* (replaces 1), 5* (replaces 2). 7 faults.

Belady's anomaly demo. With FIFO and reference: 1,2,3,4,1,2,5,1,2,3,4,5:

3 frames → 9 page faults
4 frames → 10 page faults (paradox!)

Approximations used in real OSes:

Second-chance / Clock: FIFO with a use-bit. If the candidate's use-bit is 1, give it a second chance and clear the bit.
NRU: classify pages by (referenced, modified) bits.
WSClock / aging: decaying counter approximating LRU.

Linux uses an approximation called "two-handed clock" with separate active and inactive lists.