State-Preserving Consistent-Hash Load Balancer for AI Agent Swarms
When building agent swarms with frameworks like CrewAI, AutoGen, or LangGraph, agents spin up and down to handle dynamic workloads. A standard load balancer routes each incoming prompt to a random node, causing every new agent to reload thousands of context tokens from scratch — paying full price on every turn.
| Scenario | Standard Load Balancer | Context-Ring |
|---|---|---|
| Session turn 1 | Agent A loads context | Agent A loads context |
| Session turn 2 | Agent C — cold start, full reload | Agent A again — cache hit |
| Session turn 10 | 10× full context loads | 1× context load |
| Agent node crashes | All sessions rerouted | Only 1/N sessions remapped |
Context-Ring is a production-grade reverse proxy that places session IDs and agent virtual nodes onto a consistent hash ring. Prompts from the same long-running task are deterministically routed to the exact same agent instance that already holds the chat history in local memory.
[ Incoming Prompt ]
│
▼
┌─────────────────────┐
│ Context-Ring Proxy │ ──► hash(session_id) [MurmurHash3, O(1)]
└─────────────────────┘
│
O(log N) BST lookup
│
▼
[ Consistent Hash Ring ]
├── agent-1 (vnode_0 … vnode_127)
├── agent-2 (vnode_0 … vnode_127) ◄── target (clockwise nearest)
└── agent-3 (vnode_0 … vnode_127)
│
▼
┌─────────────────────┐
│ Dedicated Agent │ ──► Context cache HIT ✓
└─────────────────────┘
- Deterministic routing — same
session_idalways maps to the same agent node. - Graceful scaling — adding or removing one node remaps only 1/N of sessions, not all of them.
- Token cost reduction — eliminates redundant context-window reloads on OpenAI, Claude, Gemini, and any other LLM API.
- Streaming passthrough — full SSE / chunked-transfer streaming support.
- Zero state transfer — agents are stateless from the proxy's perspective; state lives in their local memory.
┌───────────────────────────────────────────────────────────────────┐
│ Context-Ring Proxy │
│ │
│ FastAPI (ASGI) ──► SecurityMiddleware ──► RingManager │
│ │ │
│ ConsistentHashRing │
│ (mmh3 + bisect BST) │
│ ┌─────────────────────┘ │
│ │ asyncio.Lock (thread-safe) │
│ │ Redis pub/sub (gossip sync) │
└──────────────────────────────┼────────────────────────────────────┘
│
┌────────────────┼────────────────┐
▼ ▼ ▼
agent-worker-1 agent-worker-2 agent-worker-3
(port 8001) (port 8001) (port 8001)
| Component | Choice | Why |
|---|---|---|
| Web framework | FastAPI + asyncio | Non-blocking ASGI; sub-ms routing overhead |
| Hash function | MurmurHash3 (mmh3) | ~3× faster than SHA-256; excellent uniformity |
| Ring data structure | Sorted list + bisect | O(log N) clockwise lookup; pure Python |
| State sync | Redis pub/sub | Gossip-pattern node membership across proxy replicas |
| HTTP client | httpx (async) | Native async streaming proxy with connection pooling |
| Containerisation | Docker + Compose | Zero-config local stack; production-ready |
context-ring/
├── src/
│ ├── __init__.py # Package exports
│ ├── ring.py # ConsistentHashRing — core algorithm
│ ├── manager.py # RingManager — async façade + Redis sync
│ ├── main.py # FastAPI app — proxy endpoints + observability
│ └── security.py # Auth, rate limiting, session HMAC, headers
│
├── tests/
│ ├── conftest.py # Shared fixtures, env setup
│ ├── test_ring.py # Unit tests: hash, routing, distribution
│ ├── test_proxy.py # Integration tests: FastAPI endpoints
│ ├── test_security.py # Unit tests: auth, rate limiting, HMAC
│ └── test_manager.py # Redis integration tests (skipped without Redis)
│
├── scripts/
│ ├── admin.py # CLI: register, deregister, status, route, health
│ ├── benchmark.py # Performance benchmarks: hash, routing, distribution
│ ├── healthcheck.sh # Docker HEALTHCHECK script
│ └── load_test.py # Locust load-test scenario
│
├── examples/
│ ├── crewai_integration.py # CrewAI agent swarm wiring example
│ └── langgraph_integration.py # LangGraph StateGraph wiring example
│
├── k8s/
│ ├── deployment.yaml # Deployment, Service, ServiceAccount, HPA
│ ├── configmap.yaml # ConfigMap + Secret template
│ └── ingress.yaml # nginx-ingress with TLS, streaming, rate limits
│
├── docker/
│ └── prometheus.yml # Prometheus scrape config
│
├── .github/
│ └── workflows/
│ └── ci.yml # GitHub Actions: lint → test → docker → publish
│
├── Dockerfile # Multi-stage production image (non-root)
├── docker-compose.yml # Full local stack: proxy + agents + Redis + Prometheus
├── Makefile # Dev tasks: setup, test, lint, docker, bench, seed
├── pyproject.toml # Pytest, Ruff, Mypy configuration
├── requirements.txt # Production dependencies
├── requirements-dev.txt # Dev + test dependencies
├── .env.example # Environment variable reference
├── .gitignore
├── LICENSE # MIT
└── README.md
- Docker 24+ and Docker Compose v2
- Python 3.11+ (for local development)
git clone https://github.com/david-spies/context-ring.git
cd context-ring
cp .env.example .env
# Edit .env — set CONTEXT_RING_API_KEY to a strong random valuedocker compose up --buildThis starts:
context-ring-proxyon :8000- 3 ×
agent-workermock nodes redisfor state synchronisation
# Route a chat completion (session_id in JSON body)
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"session_id": "user-abc-123",
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello"}]
}'
# Or via header
curl -X POST http://localhost:8000/v1/chat/completions \
-H "X-Session-ID: user-abc-123" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o", "messages": [...]}'The proxy injects three headers into every forwarded request:
X-Context-Ring-Node— which agent received this requestX-Context-Ring-Hash— the MurmurHash3 hex of the session IDX-Context-Ring-Vnode— the matched virtual node index
Route a chat-completion payload to the correct agent node.
Session discriminator (one required):
- JSON body field:
"session_id": "<string>" - Request header:
X-Session-ID: <string>
Responses:
| Status | Meaning |
|---|---|
| 200 | Request proxied successfully (streamed) |
| 400 | Missing or malformed session ID |
| 503 | No agent nodes registered |
| 502 | Target agent unreachable |
| 504 | Upstream timeout |
Register a new agent node.
Requires X-Api-Key header.
{ "agent_url": "http://agent-worker-4:8001" }Response 201:
{
"status": "mounted",
"node": "http://agent-worker-4:8001",
"vnodes": 128,
"ring_size": 512
}Evict a node from the ring.
Requires X-Api-Key header. Orphaned sessions reroute automatically on their next request.
{ "agent_url": "http://agent-worker-2:8001" }Response 200:
{
"status": "evicted",
"node": "http://agent-worker-2:8001",
"orphaned_sessions": 14,
"ring_size": 384
}Cluster health, arc distribution, and per-node statistics.
{
"healthy": true,
"node_count": 3,
"vnode_count": 384,
"uptime_seconds": 3610.42,
"nodes": [
{
"url": "http://agent-worker-1:8001",
"active_sessions": 47,
"total_routed": 1823,
"arc_fraction": 0.3341,
"vnode_count": 128
}
]
}Kubernetes / Docker liveness probe.
200 {"status": "ok"}— at least one node is registered503 {"status": "degraded"}— ring is empty
Prometheus-compatible text metrics.
context_ring_nodes_total 3
context_ring_vnodes_total 384
context_ring_requests_total{outcome="routed"} 4821
context_ring_requests_total{outcome="error"} 2
context_ring_uptime_seconds 3610.42
context_ring_node_sessions{node="http://agent-worker-1:8001"} 47
All configuration is via environment variables. See .env.example for the full reference.
| Variable | Default | Description |
|---|---|---|
CONTEXT_RING_API_KEY |
(required) | Shared secret for admin endpoints |
INITIAL_AGENT_NODES |
"" |
Comma-separated agent URLs to seed on startup |
REDIS_URL |
"" |
Redis URL. Empty = standalone mode (no sync) |
VNODE_REPLICAS |
128 |
Virtual nodes per physical agent |
PROXY_TIMEOUT_SECONDS |
60 |
Upstream total timeout |
PROXY_CONNECT_TIMEOUT_SECONDS |
5 |
Upstream TCP connect timeout |
SESSION_HMAC_SECRET |
"" |
HMAC key for signed session IDs (optional) |
RATE_LIMIT_REQUESTS |
200 |
Max requests per IP per window |
RATE_LIMIT_WINDOW_SECONDS |
60 |
Rate-limit window |
MAX_BODY_BYTES |
10485760 |
Maximum request body size (10 MB) |
LOG_LEVEL |
INFO |
Python logging level |
make setup # Create venv and install all dependencies
make dev # Run proxy locally with hot-reload
make test # Full test suite with coverage report
make test-unit # Unit tests only (no network/Redis required)
make lint # Ruff linter
make format # Auto-format with ruff
make typecheck # mypy static type check
make check # lint + typecheck combined
make docker-build # Build production Docker image
make up # Start full docker-compose stack
make down # Stop and remove stack
make logs # Tail proxy logs
make bench # Run performance benchmark suite
make load-test # Start Locust (opens browser at :8089)
make seed # Register default agent nodes via API
make status # Print live ring status from running proxy
make clean # Remove venv, cache, build artefactsscripts/admin.py is a zero-dependency CLI for managing a running proxy. It requires only the Python standard library.
# Set env vars or pass --proxy / --key flags
export CONTEXT_RING_PROXY_URL=http://localhost:8000
export CONTEXT_RING_API_KEY=your-key
# Register nodes
python scripts/admin.py register http://agent-1:8001 http://agent-2:8001
# Print ring status with arc distribution bar chart
python scripts/admin.py status
# Show which node a session ID routes to (local simulation, no side effects)
python scripts/admin.py route user-session-abc123
# Evict a node
python scripts/admin.py deregister http://agent-2:8001
# Liveness check
python scripts/admin.py healthscripts/benchmark.py runs six benchmarks against the in-process ring with no external dependencies.
python scripts/benchmark.pySample output on a modern laptop:
1. MurmurHash3 raw throughput
Throughput : 3.46M ops/sec
Avg latency: 0.289 µs/hash
3. get_node routing throughput — O(log N)
3 nodes × 128 vnodes → 818K ops/sec (1.22 µs/lookup)
10 nodes × 128 vnodes → 767K ops/sec (1.30 µs/lookup)
50 nodes × 128 vnodes → 719K ops/sec (1.39 µs/lookup)
100 nodes × 128 vnodes → 685K ops/sec (1.46 µs/lookup)
5. Session stability after scale events
3 → 4 nodes (scale-out): 2,941/10,000 remapped (29.4% expected ≈25.0%)
4 → 3 nodes (scale-in) : 0/10,000 remapped ( 0.0% expected ≈ 0.0%)
A full working example is in examples/crewai_integration.py. The core pattern:
import requests
class ContextRingRouter:
def __init__(self, proxy_url: str, api_key: str):
self.proxy_url = proxy_url
self.headers = {"X-Api-Key": api_key}
def register_agent(self, agent_url: str):
requests.post(f"{self.proxy_url}/v1/register",
json={"agent_url": agent_url},
headers=self.headers)
def chat(self, session_id: str, messages: list) -> dict:
return requests.post(
f"{self.proxy_url}/v1/chat/completions",
json={"session_id": session_id, "messages": messages},
).json()import autogen
config_list = [{
"model": "gpt-4o",
"base_url": "http://localhost:8000",
"api_key": "your-openai-key",
"default_headers": {"X-Session-ID": "crew-task-42"},
}]
assistant = autogen.AssistantAgent("assistant", llm_config={"config_list": config_list})A full working example is in examples/langgraph_integration.py. The core pattern:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="gpt-4o",
openai_api_base="http://localhost:8000",
default_headers={"X-Session-ID": "langgraph-workflow-xyz"},
)import httpx
PROXY = "http://context-ring-proxy:8000"
API_KEY = os.getenv("CONTEXT_RING_API_KEY")
async def on_pod_ready(pod_url: str):
async with httpx.AsyncClient() as c:
await c.post(f"{PROXY}/v1/register",
json={"agent_url": pod_url},
headers={"X-Api-Key": API_KEY})
async def on_pod_terminating(pod_url: str):
async with httpx.AsyncClient() as c:
await c.post(f"{PROXY}/v1/deregister",
json={"agent_url": pod_url},
headers={"X-Api-Key": API_KEY})Manifests are in the k8s/ directory.
# 1. Create namespace
kubectl apply -f k8s/configmap.yaml # creates namespace + ConfigMap
# 2. Create secrets (replace placeholder values first)
kubectl create secret generic context-ring-secrets \
--namespace context-ring \
--from-literal=api-key="$(openssl rand -base64 32)" \
--from-literal=hmac-secret="$(openssl rand -base64 32)" \
--from-literal=redis-url="redis://:password@redis:6379/0"
# 3. Deploy proxy (2 replicas + HPA)
kubectl apply -f k8s/deployment.yaml
# 4. Expose externally with TLS (requires nginx-ingress + cert-manager)
kubectl apply -f k8s/ingress.yamlk8s/deployment.yaml includes a HorizontalPodAutoscaler that scales proxy replicas between 2 and 10 based on CPU/memory. All proxy replicas share ring state via Redis pub/sub.
To prevent topology-inference attacks — where an adversary crafts session IDs that deliberately hash to a target node — enable HMAC-signed sessions:
# .env
SESSION_HMAC_SECRET=your-strong-random-secretThen sign session IDs at the client:
from src.security import sign_session_id, verify_session_id
# Client side: sign before sending
signed = sign_session_id("user-session-12345")
# → "user-session-12345.a3f9c1b8d4e2"
# Proxy side: automatically verified before routingThe HMAC truncates to 16 hex characters (64-bit MAC), which provides adequate forgery resistance for session routing while keeping headers compact.
make setup
source .venv/bin/activatemake dev
# Proxy available at http://localhost:8000
# Swagger UI at http://localhost:8000/docsmake test # all tests + HTML coverage report
make test-unit # ring + security unit tests only (no Redis needed)
# With Redis for manager integration tests
REDIS_URL=redis://localhost:6379/0 pytest tests/test_manager.py -vmake check # lint + typecheck
make format # auto-fix formattingmake load-test # opens Locust UI at http://localhost:8089
make load-test-headless # 50 users, 10/s ramp, 60s rundocker compose --profile monitoring up
# Prometheus UI: http://localhost:9090
# Query: context_ring_requests_totalEach physical agent node is mapped to VNODE_REPLICAS (default 128) virtual positions on a 32-bit ring (0 → 0xFFFFFFFF). When a request arrives:
- Hash
session_idwith MurmurHash3 → 32-bit integerH - Binary-search the sorted vnode list for the first vnode with hash ≥
H - If none exists, wrap to index 0 (ring semantics)
- Return the physical node that owns that vnode
Complexity: O(log N) where N = total vnode count.
| Property | MurmurHash3 | SHA-256 |
|---|---|---|
| Throughput | ~3 GB/s | ~500 MB/s |
| Output | 32-bit | 256-bit |
| Cryptographic | No | Yes |
| Avalanche effect | Excellent | Excellent |
For session routing, we need speed and uniform distribution — not cryptographic security. MurmurHash3 delivers both.
Without virtual nodes, each physical node occupies one arc on the ring, leading to highly uneven load. With 128 replicas, the probability of any node holding >2× its fair share drops below 0.1%.
Replicas | Max arc imbalance (3 nodes, empirical)
----------+------------------------------------------
10 | ~45%
50 | ~25%
128 | ~12%
200 | ~8%
With N nodes, each node owns ~1/N of the ring. Adding one node causes ~1/(N+1) of sessions to migrate. Removing one node causes ~1/N of sessions to migrate. All other sessions are unaffected.
- Set a strong
CONTEXT_RING_API_KEY(at least 32 random bytes) - Configure
REDIS_URLfor multi-proxy / HA deployments - Set
SESSION_HMAC_SECRETto prevent topology-inference attacks - Tune
VNODE_REPLICAS(128 is a good default; increase for very heterogeneous clusters) - Place the proxy behind a TLS-terminating reverse proxy (nginx / Caddy / ALB)
- Set
CORS_ORIGINSto specific domains (not*) - Configure
RATE_LIMIT_REQUESTSappropriate to your expected traffic - Set up
/metricsscraping in Prometheus - Wire
/healthzto your orchestrator's health check (K8sreadinessProbe/ ECS) - Implement
on_pod_ready/on_pod_terminatinghooks in your agent orchestrator
MIT — see LICENSE.
Pull requests are welcome. For significant changes, open an issue first to discuss the approach. Ensure make check and make test both pass before submitting.