Add per-user LLM rate limit to /api/ai/triage by lai3d · Pull Request #16 · lai3d/sigma

lai3d · 2026-05-19T13:43:52Z

Summary

Adds a second rate-limit layer specifically for LLM-spending requests. The existing global per-IP limiter doesn't fit the cost shape — it protects connection pressure, not token budget — and an admin/operator who's already past RBAC can still spend without bound.

Mechanism

A sliding window keyed on the authenticated user's UUID (CurrentUser.id, stable across both JWT and API-key auth). Implemented in routes::ai_triage::check_llm_rate_limit using the same Redis pattern as the global limiter but with a separate prefix (llm-rate:{user_id}).

Scenario	Behaviour
Within `LLM_RATE_LIMIT_REQUESTS` per window	200 (and counts toward the next window)
Above limit	`429 Too Many Requests` with `Retry-After: <secs>`
Redis unreachable	Fails open (warn log) — availability over perfect accounting

Env var	Default	Purpose
`LLM_RATE_LIMIT_REQUESTS`	`20`	Max triages per window, per user
`LLM_RATE_LIMIT_WINDOW`	`3600`	Window length in seconds (1 hour)

Independent of the global RATE_LIMIT_REQUESTS / RATE_LIMIT_WINDOW — ten operators behind one NAT no longer share a token budget, and one operator hammering from many IPs can't bypass it.

Why a new AppError variant

Existing handler errors map to 4 status codes; 429 was missing. Added AppError::TooManyRequests { message, retry_after_secs } so the new code path stays in the standard handler return type (Result<Json<T>, AppError>) rather than constructing a Response inline. Cheap, well-tested, and reusable for future endpoints that need 429.

Tests

2 new unit tests on AppError::TooManyRequests IntoResponse — 429 status, Retry-After header present/absent.
1 new integration test `test_per_user_rate_limit_triggers_429`: uses a new `common::setup_with_llm_limit(2, 60)` helper to pin the limit low, makes 3 requests, verifies the 3rd returns 429 with Retry-After in 1..=60.
All existing tests (RBAC, parser, provider) untouched and still passing.

Docs

docs/ai-triage.{en,zh}.md gain a "Per-user rate limit" subsection between Auth and OpenAPI.

Test plan

cargo build --tests — already green locally
cargo test --lib errors:: — 7/7 green locally (includes 2 new)
CI runs the new integration test against Redis
Manual smoke: set LLM_RATE_LIMIT_REQUESTS=2 and confirm the 3rd request returns 429 with Retry-After

🤖 Generated with Claude Code

The global per-IP rate limit doesn't fit the cost shape of LLM calls. This adds a second sliding window keyed on the authenticated user's UUID (stable across JWT and API-key auth) so: - ten operators behind one NAT don't share a token budget - one operator hammering from many IPs can't bypass the limit - a per-VPS agent key (already blocked by RBAC) is gated again here Behaviour: - Configurable via LLM_RATE_LIMIT_REQUESTS (default 20) and LLM_RATE_LIMIT_WINDOW (default 3600s = 1 hour). - On exhaustion: HTTP 429 with `Retry-After` header (seconds). - Fails open if Redis is unreachable — availability over perfect accounting when our own infra hiccups. New AppError::TooManyRequests variant with optional retry_after_secs mapped to the Retry-After header. Two unit tests on the IntoResponse mapping + a new integration test that sets the limit to 2 and verifies the third request gets 429 with a sane Retry-After value. Docs: docs/ai-triage.{en,zh}.md gain a "Per-user rate limit" subsection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Backend PR #13 made /api/ai/triage admin/operator-only (403 for readonly and per-VPS agent keys), and PR #16 added a per-user LLM rate limit (429 with Retry-After). The web UI silently routed both into a generic red "Request failed" banner and still rendered the "Ask AI" button on the VPS detail page for every authenticated role. VpsDetail: gate the "Ask AI" button behind canMutate (admin|operator), matching the other mutation actions on the page. AiTriageDialog: branch on axios error status. 429 surfaces the Retry-After header as "Try again in X minutes Y seconds" in an amber panel (it's a recoverable hint, not a hard error). 403 renders a friendly "your role doesn't permit AI triage" amber panel — readonly users can still reach the standalone /ai-triage page from the sidebar, so the dialog still needs this branch. Other errors keep the existing red banner. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

lai3d merged commit 0f9c2ef into main May 19, 2026
1 check passed

lai3d deleted the claude/triage-llm-rate-limit branch May 19, 2026 13:57

lai3d mentioned this pull request May 20, 2026

Handle 403 and 429 from /api/ai/triage in the UI #18

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add per-user LLM rate limit to /api/ai/triage#16

Add per-user LLM rate limit to /api/ai/triage#16
lai3d merged 1 commit into
mainfrom
claude/triage-llm-rate-limit

lai3d commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lai3d commented May 19, 2026

Summary

Mechanism

Why a new AppError variant

Tests

Docs

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant