Add per-user LLM rate limit to /api/ai/triage#16
Merged
Conversation
The global per-IP rate limit doesn't fit the cost shape of LLM calls.
This adds a second sliding window keyed on the authenticated user's
UUID (stable across JWT and API-key auth) so:
- ten operators behind one NAT don't share a token budget
- one operator hammering from many IPs can't bypass the limit
- a per-VPS agent key (already blocked by RBAC) is gated again here
Behaviour:
- Configurable via LLM_RATE_LIMIT_REQUESTS (default 20) and
LLM_RATE_LIMIT_WINDOW (default 3600s = 1 hour).
- On exhaustion: HTTP 429 with `Retry-After` header (seconds).
- Fails open if Redis is unreachable — availability over perfect
accounting when our own infra hiccups.
New AppError::TooManyRequests variant with optional retry_after_secs
mapped to the Retry-After header. Two unit tests on the IntoResponse
mapping + a new integration test that sets the limit to 2 and verifies
the third request gets 429 with a sane Retry-After value.
Docs: docs/ai-triage.{en,zh}.md gain a "Per-user rate limit" subsection.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6 tasks
lai3d
added a commit
that referenced
this pull request
May 20, 2026
Backend PR #13 made /api/ai/triage admin/operator-only (403 for readonly and per-VPS agent keys), and PR #16 added a per-user LLM rate limit (429 with Retry-After). The web UI silently routed both into a generic red "Request failed" banner and still rendered the "Ask AI" button on the VPS detail page for every authenticated role. VpsDetail: gate the "Ask AI" button behind canMutate (admin|operator), matching the other mutation actions on the page. AiTriageDialog: branch on axios error status. 429 surfaces the Retry-After header as "Try again in X minutes Y seconds" in an amber panel (it's a recoverable hint, not a hard error). 403 renders a friendly "your role doesn't permit AI triage" amber panel — readonly users can still reach the standalone /ai-triage page from the sidebar, so the dialog still needs this branch. Other errors keep the existing red banner. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a second rate-limit layer specifically for LLM-spending requests. The existing global per-IP limiter doesn't fit the cost shape — it protects connection pressure, not token budget — and an admin/operator who's already past RBAC can still spend without bound.
Mechanism
A sliding window keyed on the authenticated user's UUID (
CurrentUser.id, stable across both JWT and API-key auth). Implemented inroutes::ai_triage::check_llm_rate_limitusing the same Redis pattern as the global limiter but with a separate prefix (llm-rate:{user_id}).LLM_RATE_LIMIT_REQUESTSper window429 Too Many RequestswithRetry-After: <secs>LLM_RATE_LIMIT_REQUESTS20LLM_RATE_LIMIT_WINDOW3600Independent of the global
RATE_LIMIT_REQUESTS/RATE_LIMIT_WINDOW— ten operators behind one NAT no longer share a token budget, and one operator hammering from many IPs can't bypass it.Why a new AppError variant
Existing handler errors map to 4 status codes;
429was missing. AddedAppError::TooManyRequests { message, retry_after_secs }so the new code path stays in the standard handler return type (Result<Json<T>, AppError>) rather than constructing aResponseinline. Cheap, well-tested, and reusable for future endpoints that need 429.Tests
AppError::TooManyRequestsIntoResponse — 429 status, Retry-After header present/absent.Retry-Afterin1..=60.Docs
docs/ai-triage.{en,zh}.mdgain a "Per-user rate limit" subsection between Auth and OpenAPI.Test plan
cargo build --tests— already green locallycargo test --lib errors::— 7/7 green locally (includes 2 new)LLM_RATE_LIMIT_REQUESTS=2and confirm the 3rd request returns 429 withRetry-After🤖 Generated with Claude Code