Engineering at PrepAtlas
Hi, I'm Anirudh. I build and run PrepAtlas — a Next.js + Supabase exam-prep platform for Indian students. It's currently strongest on railway and government-job tracks, and expanding to NCERT, JEE/NEET, banking, and SSC. This page is the build log: three architecture phases (EC2 → ECS → EKS), the five decisions that shaped them, and the triggers behind each migration.
If you came here from my LinkedIn to verify I built this end-to-end: the entire production stack runs on roughly $10/mo — one small EC2 box behind nginx, Supabase Free, Route 53 — with no Vercel, no Pinecone, no Kubernetes. v1 lives on AWS today; v2 (ECS Fargate + ALB) is the containerized scale-out I'm migrating into as load justifies; v3 (EKS + service mesh) is the Kubernetes story for when there's a team. Every claim on this page is real and defensible.
Architecture evolution · 3 phases on AWS
Three architectures. One product. Each migration triggered by a specific load / team / reliability requirement — not by resume-driven engineering. Diagrams below show what was, what is being built, and what comes next.
Lean MVP · EC2 + nginx

What serves traffic today. 6 weeks from idea to live. One VM, one process manager, one Postgres. Every web-perf optimization in scope before any infra scale-out.
- Single EC2 t3.small in ap-south-1 (Mumbai) — Ubuntu 24.04, Node 20, pm2
- nginx HTTP/2, immutable static caching, upstream keepalive
- Supabase Free · Postgres + pgvector + RLS · idempotent migrations
- Deploy: git push → SSH pull → pnpm build → pm2 reload (< 60 s, zero downtime)
Containerized scale-out · ECS Fargate + ALB

Triggered when traffic + reliability requirements outgrow a single VM. Multi-AZ HA, edge CDN, WAF for abuse, blue/green rollouts with automatic rollback.
- ECS Fargate tasks across AZ-1a + AZ-1b behind an ALB
- CloudFront at the edge, WAF for L7 rules, Secrets Manager for env
- ElastiCache Redis for sessions + rate limiting · SQS + Lambda for async jobs
- Deploy: GitHub Actions → ECR push → CodeDeploy blue/green · automatic rollback on alarm
Kubernetes at scale · EKS + service mesh

Triggered by team size, multi-region latency targets, or the need to isolate services. Pod-level autoscaling, GitOps, service mesh, multi-region data. Premature Kubernetes is the classic resume-driven mistake — v3 waits for the actual signal.
- EKS pods with HPA + KEDA + Karpenter spot nodes
- AWS App Mesh / Istio for service-to-service mTLS
- ArgoCD GitOps · Flagger metric-based canary deploys per namespace
- Aurora PostgreSQL Multi-AZ + cross-region replica · ElastiCache cluster mode
Stack
Frontend
- ·Next.js 15 App Router · React 19 · TypeScript
- ·Tailwind CSS v3 · shadcn/ui · Framer Motion · Lucide
- ·React Hook Form + Zod for validated forms
Backend & data
- ·Supabase: Postgres + Auth + RLS + Storage
- ·PostgREST + custom RPCs for the hot paths
- ·pgvector (1536-dim) on chapter notes
- ·@supabase/ssr for cookie-based session handling
Mobile
- ·Bubblewrap TWA — same web codebase, real APK
- ·@serwist/next service worker · offline shell
- ·Digital Asset Links for verified-origin trust
AI
- ·Anthropic Claude API for the doubt-clearing tutor
- ·Grounded on retrieved chapter chunks (RAG)
- ·Prompt caching for cost + latency
AWS — running today (v1)
- ·EC2 t3.small · Ubuntu 24.04 · Node 20 standalone build
- ·Route 53 DNS · TLS via Let’s Encrypt + Certbot
- ·Region: ap-south-1 (Mumbai) · single VM, pm2-supervised
- ·nginx HTTP/2 + per-class immutable static caching
AWS — migration path (v2 → v3)
- ·v2: ECS Fargate + ALB + CloudFront + WAF (designed)
- ·v3: EKS + ArgoCD GitOps + App Mesh mTLS (planned)
- ·ElastiCache · SQS + Lambda · CloudWatch + X-Ray
- ·CodeDeploy blue/green → ArgoCD canary deploys
Five engineering decisions
These are the ones I'd want a senior platform engineer to push back on first.
Grounded RAG over a plain LLM call
The AI tutor experiment was the part I least wanted to be a black box. A student asking “why is the answer C on this question about thermodynamics” deserves more than a confident hallucination.
The flow:
- Chapter notes live in
public.content_items, each chunked at write time and embedded with a 1536-dim model. The vector lives in avector(1536)column alongside the chunk text. - When a doubt is submitted, I embed the doubt + the question text, run a cosine-similarity search on the chapter pool the student's exam covers, and take the top 5 chunks.
- Those chunks go to Claude with a system prompt that explicitly forbids answering outside the retrieved context. The response carries citation markers that map back to
topic_ids, and the UI renders them as “→ see Chapter X” links the student can open.
- Per-question retrieval adds ~150 ms before the Claude call. I’m not pre-fetching speculatively because the doubt set is highly variable.
- Cosine similarity in Postgres, not a hybrid BM25 + vector rerank. At my current corpus size (single-digit thousands of chunks) recall is fine. If the corpus crosses ~100k chunks I’ll add a tsvector lexical layer and merge scores.
- I cache the system prompt with the Anthropic prompt-cache headers — the cheapest reliability win I’ve shipped.
A 200 KB mobile data budget per session
The real audience is a student in a Tier-2 Indian city on patchy 4G. I set myself a budget: a student should be able to complete a 30-question warm-up mock without burning more than 200 KB of data, ignoring images they choose to view.
That number drives a lot of unglamorous decisions:
- Server components by default. The test runner is a client island because it needs local state for the answer palette and timer — almost everything else is rendered on the server, ships zero JS, and hydrates only what genuinely needs interactivity.
auth.getSession()instead ofauth.getUser()in middleware. The former reads the JWT from the cookie locally; the latter hits Supabase per request. Skipping that round-trip on every navigation alone saved 100–200 ms and an unnecessary egress request. Security note: middleware usesgetSession()only for the routing-layer redirect decision. Every protected page and server action re-verifies identity withgetUser()(via a React-cached helper,getCachedUser) before any data access — so a forged session cookie that sneaks past the redirect still fails at the data layer.- Immutable caching on
_next/static. First mock is hot; the second mock the student takes is mostly a few KB of JSON answers. - No third-party JS on the auth shell. No analytics, no chat widget, no webfonts loaded synchronously. The full Inter family is self-hosted from one
woff2withfont-display: swap.
It is occasionally annoying — I've had to walk back a client-component implementation more than once — but the constraint forced a saner architecture than I'd have written without it.
TWA, not React Native, for the Android app
I built the Android app as a Trusted Web Activity using Bubblewrap. The package is in.prepatlas.app and the APK is ~3 MB.
Why not React Native or a Capacitor wrap:
- One codebase, one deploy. A site push lands in the app within seconds of
pm2 reload. There is no app-store-update cycle for everything that isn't the Android container itself. - Same SSR, same server actions, same auth cookies. The TWA is Chrome pretending to be an app, so every server-side optimization on the web shows up on mobile for free.
- Honest tradeoffs. I don't have native APIs (camera, biometrics, push). When I need them — most likely for proctored mocks — I'll either add a thin Capacitor bridge or move that subset to native. The TWA covers 98% of the surface today.
The non-obvious work was Digital Asset Links: the assetlinks JSON at /.well-known/assetlinks.json is the only thing that hides Chrome's URL bar in the TWA. Get the SHA-256 fingerprint wrong and the app looks like a website. I have a release-time check that verifies the served JSON matches the production keystore.
Question content as a pipeline, not a CMS
PrepAtlas needs tens of thousands of questions across math, reasoning, general awareness, and science. A traditional CMS would have made me hand-write each one. Instead I built three feeders into the same public.questions table:
- Template generator — Python scripts in
scripts/that produce formula-driven MCQs (e.g. percentage problems with parameterized inputs and randomized answer order). About 17k of the current pool came from here. - Corpus-driven generator — for general awareness and science where the answers come from a curated facts corpus. About 900 questions, fully deterministic, regenerable.
- Sonnet-assisted batches — for the long tail, currently around 2,100 questions. A separate Claude Code session runs a generation prompt with the topic taxonomy as context, emits JSON, and a Python importer validates the schema, dedupes by hashed prompt, and upserts via the Supabase service-role key.
17k templated + 900 corpus-driven + ~2,100 Sonnet-assisted ≈ 20k total.
Every question carries a topic_id. Every topic carries an exam_id. The admin UI can override difficulty, mark items as PYQ (previous-year question), or unpublish them; everything else is generated and re-runnable. If a generator script changes, I can roll forward by regenerating that batch without touching the rest.
pgvector instead of a dedicated vector store
This is the decision I think the senior infrastructure crowd will most want to challenge, so I'll lead with the constraints.
I evaluated Pinecone, Weaviate, Qdrant, and the obvious “embed yourself” path with FAISS. The deciding factor was operational surface area: every external vector store would have meant a second source of truth, a second backup story, a second outage to monitor, and a second set of credentials in env. With pgvector:
- The vectors live in the same Postgres that has the chapter rows. A join is
select chunk_text, topic_id from content_items order by embedding <=> $1 limit 5;— that's the entire retrieval call. - RLS applies to vectors the same way it applies to every other row.
- The nightly Supabase backup includes embeddings without thinking about it.
- Adding pgvector cost me one extension install and one composite index.
- pgvector with ivfflat is not faster than a dedicated store at large scale. At my current ~10k chunks the search is sub-10ms; up to 100k synthetic chunks I stay under 40ms. Past that I’d add an hnsw index or move to Qdrant — but I’d want the actual signal first.
- I don’t get a hosted UI for inspecting embeddings. I built a small admin route that lists nearest neighbors for a sample query, which has paid for itself debugging retrieval misses.
Numbers
The section recruiters skim, so I keep it honest and load-bearing.
What’s next
Two tracks running in parallel. Horizontal product expansion — SSC CGL/CHSL first, then banking (IBPS PO/Clerk, SBI PO), then state-PSC. The content pipeline already takes a (category, exam) tuple, so adding SSC is mostly seeding exams/subjects/chapters and pointing the generators at the new topic taxonomy.
Infrastructure migration — v2 components ship one at a time as load + risk justify each piece. The order I'll move in:
- CloudFront in front of static assets first — cheap latency win, zero risk.
- WAF on the login/signup paths — bot protection before paid signups land.
- ECS Fargate behind an ALB once vertical scaling on EC2 stops being enough.
- ElastiCache + Secrets Manager + CloudWatch dashboards land alongside ECS.
Each migration is a feature flag away from being a rollback, which is the whole point of doing them one at a time. v3 (EKS + service mesh) waits for a team and a real multi-region requirement — premature Kubernetes is the classic resume-driven mistake.
In parallel I'm building adaptive practice (weak-topic detection from past attempt accuracy auto-builds remediation drills) and laying down a proper observability layer — CloudWatch + OpenTelemetry with structured logs. pm2 logs are fine for one person; they won't scale to a team.