Document multi-region deployment: DB replication, Redis enforcement, regional models#297
Draft
yassin-berriai wants to merge 1 commit into
Draft
Document multi-region deployment: DB replication, Redis enforcement, regional models#297yassin-berriai wants to merge 1 commit into
yassin-berriai wants to merge 1 commit into
Conversation
…regional models Rewrite the multi-region guide to cover the three backing stores customers actually ask about and that were previously undocumented: the shared Postgres and how to replicate it across regions, Redis as a required component for accurate cross-instance budget and rate limit enforcement (with the single-primary vs per-region vs active-active trade-offs for teams that span regions), and how to keep each region's models regional instead of globally visible. Keep the existing admin/data-plane endpoint split and add a fault tolerance summary.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Relevant issues
Customers repeatedly ask how to deploy LiteLLM across multiple regions, and the existing Control Plane / Data Plane page only documented the admin/worker endpoint split. It said nothing about the three things people actually ask about: how the database is replicated across regions, the role of Redis in enforcing budgets and rate limits, and how to scope models per region. This rewrite fills those gaps
What changed
I rewrote
docs/proxy/control_plane_and_data_plane.mdaround the three backing stores a stateless LiteLLM instance depends on, since that framing is what was missingDatabase across regions; documents that a single Postgres holds keys, teams, users, budgets, spend, and (with
store_model_in_db=true) models, that every instance in every region reads and writes it, and the three replication patterns (single primary with regional read replicas, globally distributed Postgres like Aurora Global / AlloyDB / Cosmos for Postgres / CockroachDB, or fully independent per-region DBs via the High Availability Control Plane). Notes the cache-first auth reads and async batched spend writes so the latency implications are clear, and the sharedLITELLM_SALT_KEYrequirementRedis as a required component; explains that without a shared Redis each instance enforces rate limits and budgets independently, so the effective limit is the per-instance limit times the instance count. Then the part customers hit with teams that span regions: exact global enforcement needs every instance sharing a budget or limit to increment the same Redis keyspace, so I laid out the per-region vs single shared primary vs active-active trade-offs honestly, including that eventually-consistent active-active allows bounded overshoot and that async single-writer global replicas (e.g. ElastiCache Global Datastore) do not give atomic global counters
Models per region; documents the global-visibility behavior of
store_model_in_db=true(a model added "for one region" is callable everywhere, which is why per-region UI aliases are cumbersome and leaky), and recommends giving the same publicmodel_namea regional backend in each region's local config so clients call one name and hit their own region with no leakage. Adds tag-based routing as the alternative for a single shared model list, and notes thatallowed_model_regiononly covers the coarseeu/usbucketsI kept the existing admin/data-plane endpoint split (
DISABLE_ADMIN_UI,DISABLE_ADMIN_ENDPOINTS,DISABLE_LLM_API_ENDPOINTS) and client usage examples, and added a fault tolerance summary table mapping each component's failure mode to a mitigationEvery technical claim was verified against the litellm source (the v3 parallel request limiter and spend counter logic for Redis behavior, the model storage and tag routing code for the models section) rather than asserted
Pre-Submission checklist
Type
📖 Documentation
Changes
Rewrote
docs/proxy/control_plane_and_data_plane.md. No other files touched; the page id, sidebar entry, and inbound links are unchangedGenerated by Claude Code