-
Notifications
You must be signed in to change notification settings - Fork 0
04 Event Loop Design
Keywords: Event Loop, Reactor Pattern, Proactor Pattern, Netty, Java NIO, Selector, Epoll, Kqueue, IOCP, Non-blocking I/O, Thread-Per-Request, Boss/Worker Threads, Dispatch Loop, Backpressure, Throughput, Latency, Tail Latency, Fairness, Starvation, Mechanical Sympathy, Event-Driven Architecture, Concurrency, Saturation
An event loop is one of the most important architectural patterns in high-performance systems.
It is the mechanism that allows a small number of threads to coordinate a very large number of events efficiently.
Instead of asking:
Which thread should handle this request?
a reactive system asks:
What is ready right now?
This is the core idea behind:
- Java NIO
- Netty
- Vert.x
- Spring WebFlux
- reactive servers
- high-concurrency gateways
- low-latency messaging systems
- websocket platforms
- proxy servers
- streaming systems
The event loop exists because thread-per-request architectures eventually hit a wall:
- thread explosion
- context switching overhead
- memory pressure
- unstable tail latency
- poor scalability under idle connections
- excessive scheduler contention
A well-designed event loop is not just a loop.
It is a control plane for:
- I/O coordination
- readiness detection
- task dispatch
- overload protection
- fairness
- latency control
- CPU efficiency
- connection management
This page explains how to design event loops that are fast, safe, and production-grade.
The C10K Problem: Event loops are the definitive architectural answer to the challenge of handling 10,000+ concurrent connections efficiently on a single machine.
To understand why the event loop exists, you must understand what it replaces.
Traditional servlet-based systems often use this architecture:
1 Request
↓
1 Thread
↓
Blocking I/O
↓
Business Logic
↓
Response
This looks simple, but it becomes expensive at scale.
- each thread consumes memory
- each thread competes for CPU scheduling
- blocked threads waste resources
- idle connections still occupy thread slots
- context switching becomes dominant
- stack memory grows quickly
- throughput drops under load
A single Java thread may consume around 1 MB of stack memory in many real deployments.
10,000 concurrent idle connections can therefore consume enormous memory just to wait.
A non-blocking system uses a different architecture:
Connections
↓
Event Loop
↓
Ready Events
↓
Dispatch / Handoff
↓
Business Processing
↓
Response
Instead of waiting on every connection, the event loop monitors readiness and reacts only when there is actual work.
This creates several advantages:
- fewer threads
- lower context switching overhead
- lower memory usage
- better scalability with many idle connections
- more predictable resource consumption
| Feature | Thread-Per-Request | Event Loop |
|---|---|---|
| Blocking | Yes | No |
| Context Switching | High | Very Low |
| Memory Footprint | Heavy | Lightweight |
| Scalability | Limited by threads | Limited by CPU, network, and downstream capacity |
| Fairness | Depends on scheduling | Explicitly designed |
| Tail Latency | Often unstable under load | Can be tightly controlled |
| Complexity | Simpler linear code | More architectural discipline required |
The event loop is the implementation of the Reactor Pattern.
The Reactor responds to I/O events by dispatching them to the appropriate handler.
Architecture:
I/O Source
↓
Event Demultiplexer
↓
Event Loop / Reactor
↓
Handler
↓
Business Logic
The selector is the event demultiplexer.
The event loop reads the ready set and sends work to handlers.
This is the foundation of most high-performance non-blocking systems in Java.
Visual 1.2: Reactor pattern (readiness-based) vs Proactor pattern (completion-based).
While the Event Loop is the heart of the Reactor pattern, it's important to distinguish it from its cousin:
- Reactor (Java NIO / Netty): The Loop waits for a resource to become ready (e.g., "data is available to read"). You perform the actual I/O.
- Proactor (Windows IOCP / AIO): You tell the OS to perform the I/O in the background. The Loop is notified only when the operation is complete.
Note: Java's high-performance networking is almost entirely based on the Reactor pattern due to OS portability.
A single thread accepts connections, reads data, processes it, and writes the response.
This model is famously used in systems such as:
- Redis
- Node.js-style single-loop designs
- some embedded or specialized high-performance services
- zero lock contention inside the loop
- simple state reasoning
- predictable ordering
- limited multi-core utilization
- one slow task can freeze all connections assigned to the loop
- not ideal for mixed I/O + CPU workloads
If processing one event takes 1 second, all other connections handled by that loop wait.
This is the architecture commonly used by Netty.
It separates the accepting phase from the processing phase.
Architecture flow:
Client
↓
Boss Event Loop Group
↓
Accept Connection
↓
Register Channel with Worker Event Loop Group
↓
Worker Handles Read / Write / Dispatch
- usually a very small pool
- often just 1 thread
- listens for incoming connections
- handles
OP_ACCEPT - hands accepted channels to workers
- handles actual read/write readiness
- processes
OP_READ,OP_WRITE,OP_CONNECT - usually sized based on CPU cores
- keeps each channel bound to a stable loop for locality
An event loop is literally a repeating control cycle.
Simplified pseudo-code:
while (!isShutdown) {
Set<SelectionKey> readyKeys = selector.select();
processSelectedKeys(readyKeys);
runAllTasks();
}
Visual 1.3: The high-performance cycle: Polling events → Dispatching handlers → Running scheduled tasks.
The loop does three major things:
- waits for events from the OS,
- processes ready I/O events,
- runs scheduled asynchronous tasks.
This is the mechanical core of the reactor.
A production-grade Event Loop doesn't just process I/O; it must manage tasks fairly.
-
Task Slicing: To prevent one massive task from "starving" others, Netty uses an
ioRatio. It limits how much time is spent on non-I/O tasks vs. I/O events in a single cycle. -
The "JDK Epoll Bug" Fix: There is a famous bug where
Selector.select()wakes up for no reason, causing 100% CPU usage. Netty detects this "spinning" and automatically rebuilds the Selector on the fly.
Event loops achieve mechanical sympathy by leveraging OS-level mechanisms like epoll (Linux) and kqueue (macOS). This ensures zero wasted CPU cycles on idle connections.
The event loop usually sits on top of a Selector.
The selector detects readiness.
The event loop decides what to do with it.
Relationship:
Selector = readiness detection
Event Loop = dispatch and control
A selector by itself is not enough.
The event loop gives it policy, fairness, and lifecycle control.
A production event loop usually follows this lifecycle:
The loop blocks efficiently while waiting for readiness notifications.
In Java NIO, this is usually done through:
selector.select();This means:
- sleep without burning CPU
- wake up when channels are ready
- continue only when work exists
Once awakened, the loop retrieves ready events.
Example:
Set<SelectionKey> selectedKeys = selector.selectedKeys();These keys represent channels that are ready for operation.
Each key is inspected and sent to the correct handler.
Possible event types:
- accept
- read
- write
- connect
The event loop must make dispatch decisions quickly.
Processed keys must be removed from the selected set.
Otherwise, the same event may be processed repeatedly.
This is one of the most common bugs in NIO-based systems.
The loop continues forever or until shutdown is requested.
Java NIO event loops usually work with these readiness events:
| Event | Meaning |
|---|---|---|
| OP_ACCEPT | A new incoming connection is ready |
| OP_CONNECT | A connection has been established |
| OP_READ | Data is available to read |
| OP_WRITE | Socket is ready to accept more data |
Each event type requires different handling logic.
The loop must route each one correctly.
This distinction is critical:
- readiness means the channel can be operated on
- work means the actual processing you do after readiness
The event loop detects readiness.
The handler performs work.
Do not confuse them.
If you mix readiness detection with heavy processing, your loop becomes slow and unstable.
The event loop is not just a polling machine.
It is a dispatching machine.
Typical structure:
Selector
↓
Ready Key Set
↓
Event Dispatcher
↓
Handler
↓
Task Queue or Worker Pool
A good dispatch model separates:
- I/O coordination
- protocol parsing
- business logic
- persistence
- downstream calls
This separation keeps the loop responsive.
Event loops scale well because they reduce:
- thread count
- blocking time
- scheduling overhead
- memory usage
- lock contention
Instead of many idle threads, you have a smaller number of active coordination threads.
This is especially valuable in workloads with:
- many idle connections
- bursty traffic
- chat systems
- gateways
- proxy services
- websocket servers
- streaming platforms
- SSE connections
- IoT device fleets
These are not the same thing.
Purpose:
Manage readiness and dispatch
Best for:
- I/O coordination
- non-blocking sockets
- readiness polling
Purpose:
Execute independent tasks
Best for:
- CPU-bound work
- blocking work
- background processing
- parallel computation
A strong architecture usually combines both:
Event Loop → Worker Pool → Business Logic
A production-grade architecture often looks like this:
Client Connections ➔ Event Loop ➔ Dispatch ➔ Worker Thread Pool ➔ Business Logic ➔ Response
This separation is critical.
If the event loop starts doing business logic itself, performance degrades quickly.
If the worker pool is unbounded, overload spreads.
If the dispatch layer is slow, latency grows.
Everything matters.
-
Visual 1.1: Side-by-side comparison of Thread-Per-Request vs Event Loop.
-
Visual 1.2: Netty's Boss/Worker threading model architecture.
-
Visual 1.3: Data flow of Zero-Copy (Disk to NIC bypassing JVM heap).
Blocking inside an event loop is catastrophic.
Examples of blocking operations:
- database calls
- file I/O
- network calls to other services
- long CPU tasks
- sleep calls
- synchronous remote APIs
- slow logging sinks
Why it is dangerous:
- the event loop cannot service other channels
- tail latency increases
- queue depth grows
- throughput collapses
- timeouts cascade
- one connection can freeze thousands
This is one of the most common production mistakes in event-driven systems.
Visual 1.4: Impact of blocking on the loop vs. isolating tasks to dedicated worker pools.
❌ Anti-Pattern Example: EventLoop-1 handles 1,000 connections. Connection A performs a blocking JDBC call taking 5 seconds. Result: All other 999 connections freeze completely for those 5 seconds.
✅ Best Practice: Always offload blocking work to a dedicated Unbounded or Fixed worker pool.
An event loop must handle work fairly.
If one connection produces too much work, it can monopolize the loop.
Problems include:
- one client starving others
- hot channels dominating the ready set
- uneven latency
- processing bias
- unfair wakeup patterns
Good event loop design uses:
- bounded work per iteration
- fair dispatching
- task slicing
- handoff to workers when needed
Backpressure is essential.
Without it, the event loop can accept more work than it can handle.
Symptoms of missing backpressure:
- queue growth
- memory growth
- buffer buildup
- increased latency
- collapse under load
Backpressure mechanisms include:
- bounded queues
- limited per-connection work
- rejected tasks
- write interest toggling
- adaptive throttling
- rate limiting
- dropping low-priority work
Event loops must stay stable under overload.
OP_WRITE is often misunderstood.
A socket is frequently writable.
If you keep write interest enabled all the time, the selector may wake continuously even when there is no meaningful work.
This can lead to:
- busy loops
- CPU spikes
- repeated wakeups
- wasted scheduling cycles
- self-inflicted overload
Correct strategy:
- enable write interest only when there is queued outbound data
- disable it after flushing the buffer
This is one of the most important optimization rules in NIO-based event loops.
A single loop iteration should not try to do everything.
A production event loop often slices work into smaller units.
Example:
- accept a connection
- read a fixed amount of data
- queue the rest
- return to the loop
Why this matters:
- prevents starvation
- keeps latency predictable
- improves fairness
- avoids monopolization by one channel
Large tasks should be chunked and offloaded.
The event loop itself is a stateful machine.
Common states:
| State | Meaning |
|---|---|
| Starting | Initial setup |
| Running | Normal operation |
| Draining | Finishing queued work |
| Stopping | No new work accepted |
| Terminated | Shutdown complete |
Good systems define clear transitions.
Without clear lifecycle control, shutdown becomes messy.
A proper event loop must stop cleanly.
Graceful shutdown should:
- stop accepting new events,
- finish or cancel existing work,
- close channels safely,
- release selector resources,
- terminate workers if needed.
A poor shutdown can leave:
- half-open sockets
- leaked selectors
- lingering threads
- resource leaks
- inconsistent state
Shutdown is part of design, not an afterthought.
// Logic for Graceful Shutdown
public void shutdown() {
bossGroup.shutdownGracefully();
workerGroup.shutdownGracefully();
// Wait for the loops to finish pending tasks and release selectors
}// Standard Netty Graceful Shutdown
bossGroup.shutdownGracefully();
workerGroup.shutdownGracefully();
// This handles: stopping acceptance, draining tasks, and closing selectors.There are several event-loop topologies.
One event loop handles all events.
- simple
- easy to understand
- lower coordination overhead
- limited scalability
- can become a bottleneck
- poor multi-core utilization
Multiple event loops share the load.
- better scaling on multi-core systems
- higher throughput
- more isolation
- lower per-loop pressure
- more complex
- requires better coordination
- more careful channel assignment
Large systems often use a multi-reactor model.
There are different threading approaches around event loops.
- one loop for readiness
- worker pool for heavy tasks
This is common and practical.
- each loop handles a subset of channels
- better hardware utilization
- more complexity
- strong cache locality
This is common in high-performance frameworks.
- I/O loop
- CPU pool
- blocking pool
- scheduled pool
This is often the most production-friendly design.
The Reactor pattern is not the only event-based model.
- waits for readiness
- dispatches when channels are ready
- common in Java NIO and Netty
- starts asynchronous operations
- receives completion notifications
- common in completion-based I/O systems
The architectural difference is subtle but important:
- Reactor: “tell me when it is ready”
- Proactor: “tell me when it is done”
The event loop achieves mechanical sympathy because it maps well to modern operating system capabilities.
Instead of asking 10,000 sockets:
Are you ready?
Are you ready?
Are you ready?
the loop uses OS-level mechanisms like:
- epoll on Linux
- kqueue on macOS / BSD
- IOCP on Windows in completion-oriented models
This avoids wasting CPU cycles on idle connections.
The OS is asked to do readiness tracking efficiently.
Why does Netty often keep a specific connection tied to the same event loop thread?
Because of CPU cache locality.
If a connection is processed by the same thread repeatedly:
- data stays warm in L1/L2 caches
- less cache invalidation
- less memory traffic
- better branch prediction
- lower latency
If a connection bounces between threads:
- caches are constantly invalidated
- memory access becomes slower
- overhead rises sharply
Stable thread-to-connection affinity is often a major performance win.
Visual 1.5: Zero-Copy flow moving data from disk to NIC bypassing the JVM heap.
To push performance to the absolute limit, the Event Loop utilizes:
-
Zero-Copy I/O: Using
FileChannel.transferTo(), data moves directly from disk to the network buffer without ever entering the JVM Heap. This saves CPU cycles and memory bandwidth. -
Event Batching: Instead of waking up for every single packet, the loop can gather multiple ready events in one
poll()call, drastically reducing the cost of system calls (syscalls).
This is the most dangerous mistake.
This increases tail latency and harms fairness.
This leads to repeated handling and busy loops.
This can cause endless wakeups.
This leads to hidden overload and memory growth.
This makes the loop fragile and hard to scale.
This wastes available cores and creates a bottleneck.
A production event loop should be observable.
Important metrics:
| Metric | Meaning |
|---|---|
| Loop Iteration Time | How long each cycle takes |
| Ready Key Count | How many events are detected |
| Dispatch Time | How long handling takes |
| Queue Depth | How much work is waiting |
| Rejection Count | How often overload happens |
| Wakeup Count | How often the loop is interrupted |
| Tail Latency | Worst-case response behavior |
| Busy Loop Rate | Indicator of accidental spinning |
| Idle Time | Whether the loop is underutilized or sleeping appropriately |
Metrics reveal whether the loop is healthy.
Java 21 introduced Virtual Threads, changing the landscape. How do they compare?
- Event Loops: Still the gold standard for Network Proxies, Gateways, and Message Brokers where maximum throughput and fine-grained control over I/O are required.
- Virtual Threads: The best choice for Standard CRUD/Business APIs. They allow you to write simple, blocking code that scales like an Event Loop.
The Hybrid Rule: Use Event Loops (Netty) for your infrastructure/networking layer and consider Virtual Threads for your heavy business logic layer.
A chat application built on Spring Boot with a traditional thread-per-request server model needed to support 100,000 concurrent WebSockets.
At around 8,000 users, the JVM threw:
OutOfMemoryError: unable to create new native thread
The system was using huge amounts of memory just to keep idle WebSocket threads alive.
The team migrated to an event-driven architecture using Netty via a reactive stack.
- memory usage dropped dramatically
- context switching overhead disappeared
- throughput stabilized
- the system could handle far more idle long-lived connections
- the architecture became suitable for websocket-style workloads
For long-lived, mostly idle connections like:
- WebSockets
- SSE
- chat sessions
- real-time dashboards
event loops are often the correct architectural choice.
A strong event loop design usually follows these rules:
- keep the loop lightweight
- never block in the loop
- bound work per iteration
- hand off expensive work
- use backpressure
- make shutdown explicit
- monitor queue growth
- prioritize fairness
- separate I/O from business logic
- keep per-connection state minimal
- preserve cache locality
- avoid unnecessary wakeups
- use specialized pools for blocking work
These rules are what make event-driven systems stable.
If your event-driven system is slow, check for these fatal mistakes:
- Hidden blocking: using
InputStream,URLConnection, JDBC, or other blocking APIs inside a reactive chain. - Synchronous logging: writing logs to a slow sink inside the event loop.
- God threads: having one loop do everything instead of using available cores.
- Lack of backpressure: accepting more work than downstream systems can process.
- Unbounded handoff queues: hiding overload until memory fails.
- Mixing business logic with readiness logic.
- Enabling write readiness permanently.
- Doing large CPU work inline.
- Ignoring per-connection fairness.
If your Event Loop is struggling, use these professional diagnostic tools:
| Tool/Flag | Purpose | Key Metric |
|---|---|---|
-Dio.netty.eventLoopThreads=N |
Manual Thread Tuning | Compare throughput vs. core count. |
jcmd <pid> Thread.print |
Thread Dump | Look for BLOCKED states in nioEventLoop threads. |
| async-profiler | CPU Profiling | Check for "Selector Spinning" or high select() time. |
| JFR (Flight Recorder) | Latency Analysis | Look for "Socket Read" events exceeding your p99 targets. |
Visual 1.6: Visualizing bottlenecks using async-profiler flame graphs and JFR timelines.
- Visual 1.1: A side-by-side of 1,000 threads (heavy) vs. 1 Event Loop (light).
- Visual 1.2: A diagram showing "Boss" passing a key to a "Worker" queue.
- Visual 1.3: A "Zero-Copy" flow: Disk → Kernel Buffer → NIC (bypassing User Space).
Event loops are foundational in:
- Netty
- Vert.x
- reactive servers
- websocket gateways
- message brokers
- low-latency trading systems
- API gateways
- proxy servers
- streaming systems
- high-concurrency microservices
If the system has many concurrent connections and a small number of active workers, event loops are often the right tool.
Continue exploring:
- 01-NIO-Selector-Architecture
- 01-NIO-Blocking-vs-NonBlocking
- 02-Thread-Pool-Mechanics
- 02-ExecutorService-Internals
- 04-Backpressure-Strategies
- 04-Performance-Overview
- 01-NIO-Channel-Buffer-Model
An event loop is not just a programming construct.
It is an architectural boundary.
It decides:
- what happens immediately
- what gets deferred
- what gets handed off
- what gets rejected
- what gets delayed
- what gets protected from overload
- what gets kept hot in cache
- what gets isolated into workers
The best engineers do not just write loops.
They design control systems that keep the system fast, fair, and stable under load.