Arrays and Strings

Free preview

Operations, complexity, common patterns.

This is a free preview chapter. Unlock all of GATE Computer Science

Array Address Calculation

1-D Array Base Address Formula
Formulas

For a 1-D array A declared with lower bound L, the address of A[i] is:

Address(A[i]) = Base + (i - L) * w

where Base is the address of A[L] and w is the size of each element in bytes. In C, L = 0, so Address(A[i]) = Base + iw. Memory aid: 'count the gap from the lower bound, scale by element size.' Always confirm whether the question gives you the address of the FIRST element or of element[0]; if A starts at index L, you must subtract L. The total bytes occupied by an array of n elements is nw. These formulas underpin all higher-dimensional addressing, so memorize the (index - lowerbound)*width pattern.

2-D Array: Row-Major vs Column-Major
Formulas

For a 2-D array A[L1..U1][L2..U2] with element size w, let rows R = U1-L1+1 and columns C = U2-L2+1.

Row-major (C, C++): Address(A[i][j]) = Base + ((i-L1)*C + (j-L2))*w

Column-major (Fortran): Address(A[i][j]) = Base + ((j-L2)*R + (i-L1))*w

Memory aid: row-major walks across a row first, so multiply the row offset by the number of COLUMNS; column-major walks down a column first, so multiply the column offset by the number of ROWS. Mistaking R for C is the single most common GATE error. Always compute counts (R, C) before plugging in, and respect non-zero lower bounds by subtracting them.

Worked Example: 2-D Row-Major Address
Worked example

Array A[1..10][1..15], each element 4 bytes, Base = 1000, row-major. Find Address(A[4][6]).

Here L1=1, L2=1, columns C = 15-1+1 = 15, w = 4.
Address = 1000 + ((4-1)*15 + (6-1))*4
= 1000 + (45 + 5)4
= 1000 + 50
4 = 1000 + 200 = 1200.

Shortcut check: A[4][6] is in row 4. Rows 1-3 are fully before it = 3*15 = 45 elements, plus 5 elements before column 6 in row 4 = 50 elements total, times 4 bytes = 200 offset. Adding Base gives 1200. Always count 'full rows before + elements before in current row.'

Array Operations and Complexity

Time Complexity of Core Array Operations
Notes

Access by index: O(1) (random access via address arithmetic). Search (unsorted): O(n) linear scan; (sorted): O(log n) with binary search. Insertion/deletion at the END of a dynamic array: O(1) amortized; at an arbitrary position: O(n) because elements must shift. Inserting at the BEGINNING is worst case—every element shifts, O(n). Memory aid: 'arrays are cheap to read, expensive to reshape.' Updating an element in place is O(1). The fixed-size limitation forces O(n) resizing in static arrays. Contrast with linked lists, which give O(1) insert/delete at a known node but O(n) access. GATE frequently tests this access-vs-modify tradeoff.

Dynamic Array Amortized Doubling
Summary

A dynamic array (e.g., C++ vector, Java ArrayList) doubles its capacity when full. Copying n elements costs O(n), but this happens rarely. Summing the cost of n appends, total copy work is n + n/2 + n/4 + ... < 2n, so the amortized cost per append is O(1). Memory aid: 'doubling makes the rare expensive copy pay for many cheap appends.' If instead you grew by a CONSTANT amount each time, total work would be O(n^2) and amortized O(n)—much worse. This geometric-growth amortization is a favorite GATE/interview topic; remember the < 2n bound from the geometric series.

In-Place Array Reversal
Worked example

To reverse an array in place, swap A[i] with A[n-1-i] for i from 0 to n/2 - 1. This uses O(1) extra space and O(n) time (n/2 swaps). Example for n=5 (indices 0..4): swap (0,4), swap (1,3); index 2 is the middle and stays. Memory aid: 'two pointers walking inward, stop when they meet or cross.' A common bug is looping i to n instead of n/2, which reverses twice and yields the original array. Reversal is the building block for rotation: rotate-left by k = reverse first k, reverse rest, reverse whole.

String Algorithms and Pattern Matching

Naive vs KMP Pattern Matching
Notes

Naive string matching slides the pattern (length m) over the text (length n) one position at a time, re-comparing from scratch on each mismatch: worst case O(nm). KMP (Knuth-Morris-Pratt) precomputes a failure/LPS (Longest Proper Prefix which is also Suffix) array in O(m), then scans the text once without backtracking, giving O(n + m) total. Memory aid: 'KMP never re-reads a text character.' The LPS array tells how far to shift the pattern on a mismatch by reusing already-matched prefix information. KMP space is O(m) for the LPS array. Rabin-Karp uses hashing for average O(n+m) but worst-case O(nm) due to hash collisions.

Computing the KMP LPS (Failure) Array
Worked example

LPS[i] = length of the longest proper prefix of pattern[0..i] that is also a suffix of pattern[0..i]. 'Proper' means it cannot be the whole substring. Example, pattern = 'ABABACA':
A -> 0
AB -> 0
ABA -> 1 (prefix 'A' = suffix 'A')
ABAB -> 2 ('AB')
ABABA -> 3 ('ABA')
ABABAC -> 0 (no match)
ABABACA -> 1 ('A')
So LPS = [0,0,1,2,3,0,1]. Memory aid: 'match the longest border.' On a text mismatch after matching j characters, jump the pattern index to LPS[j-1] instead of 0. Building LPS is O(m) using two pointers.

Common String Complexities Summary
Summary

Concatenating two strings of lengths a and b: O(a+b). Comparing two strings: O(min length) worst case. Reversing a string: O(n). Checking palindrome: O(n) two-pointer. Naive substring search: O(n*m). KMP / Z-algorithm: O(n+m). Computing all character frequencies: O(n) with a fixed-size count array (O(1) if alphabet size is constant, e.g., 256 ASCII). Sorting characters of a string: O(n log n) comparison sort, or O(n + k) counting sort for fixed alphabet k. Memory aid: 'linear scans for most one-pass tasks, n log n only when sorting by comparison.' In C, string length via strlen is O(n) because it scans to the null terminator.

Sparse Matrices and Special Arrays

Sparse Matrix Triplet Representation
Notes

A sparse matrix has most entries zero. Storing it fully wastes space, so we use a triplet (3-tuple) representation: each non-zero element is stored as (row, column, value). The table typically has a header row recording (total rows, total columns, number of non-zeros). For a matrix with t non-zero elements, storage is (t+1) rows x 3 columns. Memory aid: 'one row per non-zero, plus one header.' This saves space only when t is small relative to mn; the break-even point is roughly when 3(t+1) < m*n. Other formats include CSR (Compressed Sparse Row) and CSC, which compress further by storing row pointers.

Storing Triangular and Banded Matrices
Formulas

A lower-triangular n x n matrix has only i >= j entries non-zero, totaling n(n+1)/2 elements; storing these in a 1-D array saves ~half the space. Row-major index of A[i][j] (1-based, i>=j) in the packed array = i(i-1)/2 + (j-1). Memory aid: 'rows above contribute 1+2+...+(i-1) = i(i-1)/2 elements.' An upper-triangular matrix is symmetric to this. A symmetric matrix also needs only n(n+1)/2 stored values since A[i][j]=A[j][i]. A tridiagonal (banded) matrix has at most 3n-2 non-zero elements (main diagonal n, plus two adjacent diagonals of n-1 each).

Worked Example: Lower-Triangular Storage Index
Worked example

Store a 1-based lower-triangular matrix row by row in a 1-D array. Find the array index (0-based) of A[4][2].

Elements before row 4 = rows 1,2,3 have 1+2+3 = 6 elements = 34/2 using i(i-1)/2 with i=4 -> 43/2 = 6.
Within row 4, A[4][2] is the 2nd element, offset (j-1) = 1.
Index = 6 + 1 = 7 (0-based), i.e., the 8th stored element.

Formula check: i(i-1)/2 + (j-1) = 4*3/2 + (2-1) = 6 + 1 = 7. Memory aid: 'triangular numbers count the rows above, then add the column offset within the current row.'