A collection of hands-on, runnable demos exploring the building blocks of distributed systems and backend infrastructure. Each folder is a self-contained project - its own package.json (or docker-compose.yml), its own README, and a focused set of scripts you can run in minutes.
The goal isn't to build a library. It's to internalize how things actually work by wiring them up from scratch: WAL streaming between two real Postgres containers, Lua scripts executing atomically inside Redis, a TCP server parsing its own protocol, a Bloom filter's bitset flipping one hash at a time.
Inspired by Arpit Bhayani's system design lectures, various distributed-systems textbooks, and the pattern of "read about it, then build it small enough to fit in your head."
- Who is this for
- Repository Layout
- Prerequisites
- Suggested Learning Path
- Module Index
- Common Conventions
- Troubleshooting
- Further Reading
- License
- Engineers preparing for system design interviews who want concrete references beyond hand-wavy diagrams.
- Backend developers who've used Redis / Kafka / Postgres but never peeked under the hood.
- Anyone who learns better by running code than by reading whitepapers.
Each module has a README that goes deeper than this one - with protocol specs, algorithm derivations, ASCII architecture diagrams, and experiments to try. Start with the module that interests you and follow its README.
systems/
├── bloom-filters/ Probabilistic set membership
├── consistent-hashing/ Key → node mapping with minimal churn
├── cron-jobs/ BullMQ repeatable jobs on Redis
├── custom-protocol/ Redis-RESP-like TCP protocol from scratch
├── db-replica/ Postgres streaming replication + failover
├── implementations/
│ ├── abuse-masker/ Real-time chat abuse masking with Trie O(n)
│ ├── e-commerce-product-listing/ End-to-end: API + master/replica routing
│ ├── notification-service/ Priority queues, bulk iteration, Bloom dedup
│ └── tinder-feed/ Geospatial queries, Bloom filters, match detection
├── kafka/ Topics, partitions, consumer groups
├── leader-election/ Bully algorithm simulation
├── mcp-server/ MCP server that auto-generates tools from OpenAPI
├── rate-limiter/ 4 algorithms, all atomic Lua on Redis
├── relationa_db_transactions/ 12 SQL scripts: ACID, MVCC, isolation, WAL
├── scaling-db/
│ ├── read-replicas/ Read-replica routing demo
│ └── sharding/ Horizontal sharding by key
└── README.md ← you are here
You won't need all of these for every module - each module's README lists exactly what it depends on.
| Tool | Used by | Notes |
|---|---|---|
| Node.js 18+ | everything | Most demos are TypeScript or plain JS |
| pnpm (preferred) or npm | everything | Install with npm i -g pnpm |
| Docker + Docker Compose | kafka, cron-jobs, rate-limiter, db-replica, scaling-db, relationa_db_transactions, implementations/* | Used to spin up Postgres, Redis, Kafka, etc. locally |
| psql (optional) | db modules | Handy for interactive exploration; not required since demos use the pg driver |
| telnet or nc (optional) | custom-protocol | Manual protocol testing |
Everything runs locally. There are no cloud dependencies.
If you're new to distributed systems, here's a path from smallest-scope to largest-scope. Each one builds intuition you'll reuse in the next.
custom-protocol/- Start here. Understand how two processes talk over a socket. Everything else in this repo rides on top of ideas from here.bloom-filters/- A self-contained data structure that shows up everywhere (caches, DB index probes, CDN).consistent-hashing/- The routing primitive behind shards, DHTs, and cache clusters.relationa_db_transactions/- Before you distribute a database, understand what a single-node transactional database guarantees.rate-limiter/- Your first taste of "atomicity matters under concurrency" on Redis.leader-election/- How systems agree on "one of us is in charge" without external help.db-replica/- Real streaming replication between real Postgres containers. Failover included.kafka/- Move from request/response to event streams; see why "write once, read many" changes system design.cron-jobs/- Durable background work. Closes the loop on queues + workers + retries.scaling-db/sharding/- Partitioning data when a single node isn't enough.implementations/e-commerce-product-listing/- Tie it together in a tiny app with read/write splitting.
Skip around freely - nothing is hard-gated on anything else. This is just one sensible order.
Each module below links to its own README, which is where the real content lives.
A probabilistic set membership structure that answers "is X in the set?" with two states: definitely not or might be. Implemented in TypeScript from scratch, plus a demo of the RedisBloom module.
What you'll learn
- Why Bloom filters trade memory for a tunable false-positive rate.
- How to pick
m(bit array size) andk(hash function count) for a target FP rate. - Where Bloom filters fit in real systems: caches, LSM-tree SSTables, CDN cache lookups, spam filters, username availability checks.
Quick start
cd bloom-filters
pnpm install
pnpm demo # basic behavior
pnpm false-positive # watch FP rate climb as the filter fills
pnpm redis # RedisBloom module version (requires redis-stack)Implementation of the ring-based consistent hashing algorithm, with virtual nodes for even distribution.
What you'll learn
- Why
hash(key) % Nis catastrophic whenNchanges, and consistent hashing only moves~k/Nkeys. - The ring model: nodes placed at hash positions, keys assigned to the first node clockwise.
- Why naive consistent hashing gives uneven distribution, and how virtual nodes fix it.
- Real-world usage: Amazon Dynamo, Cassandra, Memcached clients, CDNs.
Quick start
cd consistent-hashing
pnpm install
pnpm demo # key ownership + adding/removing nodes
pnpm scale # compare simple hash vs consistent hash under scaling
pnpm virtual-nodes # distribution with/without virtual nodesTwelve annotated SQL scripts that walk you through transaction internals in PostgreSQL. No application code - just psql and a single Postgres container.
What you'll learn
- ACID properties made concrete: atomicity, consistency, isolation, durability.
- All four SQL isolation levels with dirty reads, non-repeatable reads, phantom reads, and write skew demonstrated in side-by-side sessions.
- MVCC internals via
xmin/xmaxvisible row versions. - Deadlocks, savepoints, and the Write-Ahead Log.
- Retry logic patterns for serialization failures.
Quick start
cd relationa_db_transactions
docker-compose up -d
docker exec -i transactions_db psql -U admin -d transactions_db < 01_basic_transaction.sqlSeveral scripts are two-terminal exercises (concurrent sessions) - the script header tells you which.
Two real Postgres containers, streaming replication over WAL, and a manual-failover script. This is the most detailed module in the repo.
What you'll learn
- Why the replica is read-only by design (the
standby.signalfile, not a config flag). - How
pg_basebackup -Rbootstraps a replica and configures it to follow the primary. - The WAL sender / WAL receiver / startup-process trio on both sides of the connection.
- LSNs (log sequence numbers) - how to tell exactly how far behind a replica is.
- Async vs sync replication trade-offs.
- Manual failover with
pg_promote()and what that means for application connection strings. - What happens during network partitions, and the role of
wal_keep_size.
Quick start
cd db-replica
docker compose up -d
sleep 10
pnpm install
pnpm run demo # basic replication
pnpm run failover # kill primary, promote replica, write to itA simpler take on read-replicas focused on query routing: writes to primary, reads distributed across replicas.
What you'll learn
- Routing reads vs writes at the application layer.
- When read-your-own-writes gets in the way, and how session pinning mitigates it.
Quick start
cd scaling-db/read-replicas
docker compose up -d
pnpm install
pnpm setup
pnpm populate
pnpm demoHorizontal partitioning across multiple Postgres instances by a shard key, with an application-side shard manager.
What you'll learn
- How to pick a shard key (and what happens when you pick badly: hot spots, cross-shard joins, rebalancing pain).
- Routing writes and reads to the right shard.
- Cross-shard queries: how scatter/gather works.
- Why resharding is the nightmare people warn you about.
Quick start
cd scaling-db/sharding
docker compose up -d
npm install
npm run setup
npm run populate
npm run demo
npm run query # example single-shard vs cross-shard queriesA blog-publishing system using Kafka: one producer, two independent consumer groups (search indexer + per-user post counter), multiple partitions.
What you'll learn
- The core mental shift from queues to streams: messages aren't deleted after consumption; consumer groups commit offsets.
- Why Kafka solves the "dual-write problem" (one topic → many consumer groups → no inconsistency if the API crashes between writes).
- Partitions as the unit of parallelism, with partition keys preserving per-key ordering.
- Consumer group rebalancing: what happens when you start a second consumer in the same group.
- Offset management: stop a consumer, restart it, watch it resume from where it left off.
Quick start
cd kafka
docker-compose up -d # Kafka + Kafka UI on :8080
pnpm install
pnpm run search-consumer # terminal 1
pnpm run counter-consumer # terminal 2
pnpm run producer # terminal 3Production-style cron scheduling with BullMQ on Redis. The key insight: there's no scheduler daemon - repeatable jobs are just delayed jobs that re-add themselves when executed.
What you'll learn
- Why BullMQ beats
node-cron/setIntervalfor anything that matters: persistence, retries with backoff, concurrency limits, stalled-job detection, deduplication, graceful shutdown. - Idempotent schedule registration - safe to run on every deploy.
- Worker failure & recovery: jobs don't pile up when workers are down.
- Schedule reconciliation: syncing a "source of truth" database with Redis runtime state.
Quick start
cd cron-jobs
docker-compose up -d # Redis + Redis Commander on :8081
pnpm install
pnpm run add-schedule # register 4 sample schedules
pnpm run worker # process them
pnpm run list-schedulesA comprehensive system design guide and TypeScript implementation of a scalable notification service. Covers templates, priority queues (P1/P2/P3), bulk iteration, and Bloom filter deduplication. Simulates Resend, Twilio, Firebase, and APNS providers.
What you'll learn
- Day zero to production: evolve from synchronous single-user flow to fully async, horizontally scalable architecture.
- Asynchronous architecture: control service enqueues, returns immediately; workers send via provider SDKs later.
- The starvation problem: why a single queue fails and how priority queues (P1/P2/P3) solve it.
- Bulk notification pattern: iterator workers read from a users replica, expand jobs into individual messages, avoiding control service bottleneck.
- The deduplication problem: naive tracking vs Bloom filters - storage math (4 GB → 114 MB for 100M users), trade-offs, and where to deduplicate.
- Design principles: separation of concerns, dumb workers, queue decoupling, trading accuracy for efficiency.
Quick start
cd implementations/notification-service
pnpm install
docker-compose up -d # Redis Stack with Bloom filter support
pnpm demo:single # single notification flow
pnpm demo:bulk # bulk campaign with iterator
pnpm demo:priority # P1 bypasses P3 congestion
pnpm demo:dedup # Bloom filter prevents duplicates
pnpm demo:all # complete walkthroughFour rate-limiting algorithms - Fixed Window, Sliding Window Log, Sliding Window Counter, Leaky Bucket - each implemented as a single atomic Lua script running inside Redis.
What you'll learn
-
The naive race condition: why
GET → check → INCRleaks requests under concurrency (demonstrated with 20 concurrent requests blowing past a limit of 5). -
Why
EVAL(Lua) wraps the read-check-write into a single atomic operation. -
Trade-offs at a glance:
Algorithm Memory Accuracy Burst handling Fixed Window 1 counter / user / window approximate allows 2× burst at window boundary Sliding Log N timestamps / user exact no edge burst Sliding Counter 2 counters / user approximate smooths edge burst Leaky Bucket level + timestamp exact outflow enforces steady downstream rate
Quick start
cd rate-limiter
pnpm install
docker-compose up -d
pnpm fixed-window
pnpm sliding-log
pnpm sliding-counter
pnpm leaky-bucket
pnpm race-condition # naive vs Lua: always exactly 5 vs sometimes more
pnpm all # same workload, all four algorithms, side by sideA Redis-RESP-style, text-based, line-delimited protocol on top of raw TCP. Implements SET, GET, DEL, PING, QUIT with simple strings, errors, and bulk strings.
What you'll learn
- How two processes agree on a wire format - the quiet assumption behind every HTTP call, Redis command, and database driver.
- Request framing (why
\nor length-prefixing matters). - Why databases and queues often invent their own protocols instead of using HTTP: no header overhead, purpose-built parsing, far lower latency per op.
- The costs: no browser tooling, no Postman, every client has to be written by hand.
Quick start
cd custom-protocol
npm install
npm run build
npm run server # terminal 1
npm run client # terminal 2
# or: telnet localhost 9999The Bully Algorithm simulated on a single machine using timers to mimic independent processes. Start with 5 nodes, kill the leader, watch a new one get elected.
What you'll learn
- Why "who monitors the monitor?" is infinite recursion, and leader election is the base case that stops it.
- Heartbeats, randomized election timeouts (to avoid simultaneous-election storms),
ELECTION/OK/COORDINATORmessages. - Why the Bully algorithm is simple but chatty, and when you'd reach for Raft or Paxos instead.
- Where leader election shows up in production: etcd, Kafka controller, Patroni, Redis Sentinel, Consul.
Quick start
cd leader-election
node demo.js # high-level simulation
node bully-algorithm.js # fuller implementation with message types + delaysA tiny Model Context Protocol server that wraps a Hono + OpenAPI API and auto-generates one MCP tool per allowed route. Add a route to the REST API, restart, and the new tool shows up in tools/list on the next connection. No hand-maintained tool catalogue.
What you'll learn
- What MCP is in plain terms: tools (model-callable functions), resources (pinned read-only data), prompts (slash-style workflow shortcuts).
- How to turn an OpenAPI document into MCP tool definitions - names, titles, descriptions, and input schemas - without writing them twice.
- Why running the MCP handler in the same process as your REST API means you reuse every middleware, validator, and auth check you already wrote.
- The deny list pattern: filter dangerous routes (admin, auth, webhooks, credentials) before registration so they never appear to the model.
- Stateless JSON-RPC over HTTP:
initialize,tools/list,tools/call,resources/list,prompts/listas plain POSTs, no SSE required.
Quick start
cd mcp-server
pnpm install
pnpm server # terminal 1: starts the server on :3333
pnpm demo # terminal 2: walks through initialize → list tools → call toolsCompanion blog post: An MCP Server That Writes Itself.
Real-time abuse masking for live stream chat using a Trie data structure. Socket.IO server with CLI client. Demonstrates why not everything needs to be a microservice.
What you'll learn
- Trie data structure: character-by-character string matching without tokenization.
- O(n) masking algorithm: single-pass traversal of message and trie simultaneously.
- Why NOT a separate service: network calls add milliseconds, trie lookups take microseconds. For pure computation, keep it in-memory.
- Socket.IO rooms: broadcast abstraction for real-time chat.
- Load once, use forever: fetch abuse dictionary on startup, then pure in-memory operations.
Quick start
cd implementations/abuse-masker
bun install
bun server # terminal 1
bun client # terminal 2
bun client # terminal 3A tiny product catalog backend - Express, Postgres primary + replica, read-heavy traffic - that wires together lessons from db-replica/ and scaling-db/read-replicas/.
Key design decision: the master handles reads too. Since writes are rare (shop owner edits), there's spare capacity for reads. Customer reads are distributed 50/50 between master and replica. Each query is tagged [MASTER :5432] or [REPLICA:5433] in the logs so you can watch the routing live.
What you'll learn
- How to actually route a connection pool: separate pools for writes vs reads, random distribution across replicas.
- Why "only read from replicas" isn't a law - you route reads based on your actual write volume.
- A realistic replication status endpoint for monitoring.
Quick start
cd implementations/e-commerce-product-listing
docker compose up -d
sleep 10
pnpm install
node src/init-db.js
node src/seed.js
node src/server.js
# then: curl http://localhost:3000/productsLocation-based feed system demonstrating geospatial queries, Bloom filter deduplication, and match detection. Implements the core mechanics of a Tinder-like swipe-based matching application.
What you'll learn
- Redis geospatial commands: GEOADD, GEORADIUS, GEODIST for proximity queries.
- Why data size isn't the problem: 600MB for 50M users is trivial; query load (1.67M writes/sec) is the real challenge.
- Bloom filters for "definitely not seen": Zero false negatives guarantee previously-swiped profiles never reappear.
- Feed database design trade-offs: Store candidate ID (extra network call) vs full profile (stale data risk).
- Why NOT store as a list: Document size limits, serialization costs, unbounded growth.
- Async feed generation: Queue-based pattern for non-blocking user experience.
- Match detection: Simple bidirectional interest check in feed database.
Quick start
cd implementations/tinder-feed
pnpm install
docker-compose up -d
pnpm init
pnpm seed
pnpm demo:allA few patterns repeat across modules so the whole repo feels consistent.
Package manager. pnpm is preferred; npm works for anything with a lockfile. A couple of older modules still use npm.
Type checking. Every folder that ships a tsconfig.json exposes pnpm run type-check (tsc --noEmit). To verify all TypeScript modules at once from the repo root:
./scripts/typecheck-all.shEntry scripts. Most modules expose commands via package.json scripts - pnpm demo, pnpm run worker, etc. Check each module's README for the full list.
Docker. Anything that needs Redis, Kafka, or Postgres ships a docker-compose.yml so you don't pollute your system. Always docker compose down -v when you're done to reclaim volumes.
Ports. Each module tries to pick unused ports, but conflicts happen if you run two at once. Relevant defaults:
- Redis:
6379 - Redis Commander / Kafka UI:
8080/8081 - Postgres primary:
5432 - Postgres replica:
5433 - Custom-protocol server:
9999 - E-commerce backend:
3000
Logs over UIs. Every demo prints heavily - which node got the write, which partition received the message, which replica served the read. Read the terminal, not just the UI.
Self-contained. You can rm -rf any top-level folder without breaking anything else.
"Port already allocated"
Another container or local service is holding the port. List usage and stop the offender:
lsof -i :5432 # or whichever port
docker ps # look for the relevant container
docker stop <id>Docker Compose says version is obsolete
Harmless warning on newer Compose versions. Ignore, or remove the version: line from the YAML.
Postgres replica won't connect
docker logs pg_replica
docker logs pg_primaryCommon causes: pg_hba.conf not allowing the replicator user, the primary hasn't finished its init scripts yet, or a stale data volume from a previous run. docker compose down -v and start fresh.
Redis commands "succeed" but limits leak
You're probably using a naive GET/INCR flow instead of a Lua script. See rate-limiter/ → pnpm race-condition for the demonstration and fix.
pnpm complains about lockfile / workspace
Each subfolder is an independent project. Run commands from inside the relevant folder, not from the repo root.
Things that informed this repo, roughly in order of "how much they shaped my thinking":
- Arpit Bhayani - arpitbhayani.me - the lecture series this repo started as homework for.
- Designing Data-Intensive Applications - Martin Kleppmann. If you read only one book on this topic, read this one.
- Database Internals - Alex Petrov. Goes deeper on storage engines, B-trees, LSM trees, and replication.
- PostgreSQL docs - specifically the chapters on high availability, WAL, and MVCC.
- Kafka: The Definitive Guide - for going deeper than the
kafka/module. - Redis in Action and the Redis command docs - especially the sections on Lua scripting and keyspace design.
Companion write-ups for many demos live on the blog: System Design. Each module README links to its closest post where one exists.
MIT. Use any of this as you like - take ideas, copy snippets into your own projects, fork and extend. Attribution appreciated but not required.