Skip to content

Document multi-region deployment: DB replication, Redis enforcement, regional models#297

Draft
yassin-berriai wants to merge 1 commit into
mainfrom
claude/hopeful-newton-iYthh
Draft

Document multi-region deployment: DB replication, Redis enforcement, regional models#297
yassin-berriai wants to merge 1 commit into
mainfrom
claude/hopeful-newton-iYthh

Conversation

@yassin-berriai

Copy link
Copy Markdown
Contributor

Relevant issues

Customers repeatedly ask how to deploy LiteLLM across multiple regions, and the existing Control Plane / Data Plane page only documented the admin/worker endpoint split. It said nothing about the three things people actually ask about: how the database is replicated across regions, the role of Redis in enforcing budgets and rate limits, and how to scope models per region. This rewrite fills those gaps

What changed

I rewrote docs/proxy/control_plane_and_data_plane.md around the three backing stores a stateless LiteLLM instance depends on, since that framing is what was missing

Database across regions; documents that a single Postgres holds keys, teams, users, budgets, spend, and (with store_model_in_db=true) models, that every instance in every region reads and writes it, and the three replication patterns (single primary with regional read replicas, globally distributed Postgres like Aurora Global / AlloyDB / Cosmos for Postgres / CockroachDB, or fully independent per-region DBs via the High Availability Control Plane). Notes the cache-first auth reads and async batched spend writes so the latency implications are clear, and the shared LITELLM_SALT_KEY requirement

Redis as a required component; explains that without a shared Redis each instance enforces rate limits and budgets independently, so the effective limit is the per-instance limit times the instance count. Then the part customers hit with teams that span regions: exact global enforcement needs every instance sharing a budget or limit to increment the same Redis keyspace, so I laid out the per-region vs single shared primary vs active-active trade-offs honestly, including that eventually-consistent active-active allows bounded overshoot and that async single-writer global replicas (e.g. ElastiCache Global Datastore) do not give atomic global counters

Models per region; documents the global-visibility behavior of store_model_in_db=true (a model added "for one region" is callable everywhere, which is why per-region UI aliases are cumbersome and leaky), and recommends giving the same public model_name a regional backend in each region's local config so clients call one name and hit their own region with no leakage. Adds tag-based routing as the alternative for a single shared model list, and notes that allowed_model_region only covers the coarse eu/us buckets

I kept the existing admin/data-plane endpoint split (DISABLE_ADMIN_UI, DISABLE_ADMIN_ENDPOINTS, DISABLE_LLM_API_ENDPOINTS) and client usage examples, and added a fault tolerance summary table mapping each component's failure mode to a mitigation

Every technical claim was verified against the litellm source (the v3 parallel request limiter and spend counter logic for Redis behavior, the model storage and tag routing code for the models section) rather than asserted

Pre-Submission checklist

  • My PR's scope is as isolated as possible; it only solves 1 specific problem
  • I have added meaningful tests (n/a, docs-only change)

Type

📖 Documentation

Changes

Rewrote docs/proxy/control_plane_and_data_plane.md. No other files touched; the page id, sidebar entry, and inbound links are unchanged


Generated by Claude Code

…regional models

Rewrite the multi-region guide to cover the three backing stores customers
actually ask about and that were previously undocumented: the shared Postgres
and how to replicate it across regions, Redis as a required component for
accurate cross-instance budget and rate limit enforcement (with the
single-primary vs per-region vs active-active trade-offs for teams that span
regions), and how to keep each region's models regional instead of globally
visible. Keep the existing admin/data-plane endpoint split and add a fault
tolerance summary.
@vercel

vercel Bot commented Jun 4, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Jun 4, 2026 5:12pm

Request Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant