Skip to content

Add per-user LLM rate limit to /api/ai/triage#16

Merged
lai3d merged 1 commit into
mainfrom
claude/triage-llm-rate-limit
May 19, 2026
Merged

Add per-user LLM rate limit to /api/ai/triage#16
lai3d merged 1 commit into
mainfrom
claude/triage-llm-rate-limit

Conversation

@lai3d
Copy link
Copy Markdown
Owner

@lai3d lai3d commented May 19, 2026

Summary

Adds a second rate-limit layer specifically for LLM-spending requests. The existing global per-IP limiter doesn't fit the cost shape — it protects connection pressure, not token budget — and an admin/operator who's already past RBAC can still spend without bound.

Mechanism

A sliding window keyed on the authenticated user's UUID (CurrentUser.id, stable across both JWT and API-key auth). Implemented in routes::ai_triage::check_llm_rate_limit using the same Redis pattern as the global limiter but with a separate prefix (llm-rate:{user_id}).

Scenario Behaviour
Within LLM_RATE_LIMIT_REQUESTS per window 200 (and counts toward the next window)
Above limit 429 Too Many Requests with Retry-After: <secs>
Redis unreachable Fails open (warn log) — availability over perfect accounting
Env var Default Purpose
LLM_RATE_LIMIT_REQUESTS 20 Max triages per window, per user
LLM_RATE_LIMIT_WINDOW 3600 Window length in seconds (1 hour)

Independent of the global RATE_LIMIT_REQUESTS / RATE_LIMIT_WINDOW — ten operators behind one NAT no longer share a token budget, and one operator hammering from many IPs can't bypass it.

Why a new AppError variant

Existing handler errors map to 4 status codes; 429 was missing. Added AppError::TooManyRequests { message, retry_after_secs } so the new code path stays in the standard handler return type (Result<Json<T>, AppError>) rather than constructing a Response inline. Cheap, well-tested, and reusable for future endpoints that need 429.

Tests

  • 2 new unit tests on AppError::TooManyRequests IntoResponse — 429 status, Retry-After header present/absent.
  • 1 new integration test `test_per_user_rate_limit_triggers_429`: uses a new `common::setup_with_llm_limit(2, 60)` helper to pin the limit low, makes 3 requests, verifies the 3rd returns 429 with Retry-After in 1..=60.
  • All existing tests (RBAC, parser, provider) untouched and still passing.

Docs

docs/ai-triage.{en,zh}.md gain a "Per-user rate limit" subsection between Auth and OpenAPI.

Test plan

  • cargo build --tests — already green locally
  • cargo test --lib errors:: — 7/7 green locally (includes 2 new)
  • CI runs the new integration test against Redis
  • Manual smoke: set LLM_RATE_LIMIT_REQUESTS=2 and confirm the 3rd request returns 429 with Retry-After

🤖 Generated with Claude Code

The global per-IP rate limit doesn't fit the cost shape of LLM calls.
This adds a second sliding window keyed on the authenticated user's
UUID (stable across JWT and API-key auth) so:
- ten operators behind one NAT don't share a token budget
- one operator hammering from many IPs can't bypass the limit
- a per-VPS agent key (already blocked by RBAC) is gated again here

Behaviour:
- Configurable via LLM_RATE_LIMIT_REQUESTS (default 20) and
  LLM_RATE_LIMIT_WINDOW (default 3600s = 1 hour).
- On exhaustion: HTTP 429 with `Retry-After` header (seconds).
- Fails open if Redis is unreachable — availability over perfect
  accounting when our own infra hiccups.

New AppError::TooManyRequests variant with optional retry_after_secs
mapped to the Retry-After header. Two unit tests on the IntoResponse
mapping + a new integration test that sets the limit to 2 and verifies
the third request gets 429 with a sane Retry-After value.

Docs: docs/ai-triage.{en,zh}.md gain a "Per-user rate limit" subsection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lai3d lai3d merged commit 0f9c2ef into main May 19, 2026
1 check passed
@lai3d lai3d deleted the claude/triage-llm-rate-limit branch May 19, 2026 13:57
lai3d added a commit that referenced this pull request May 20, 2026
Backend PR #13 made /api/ai/triage admin/operator-only (403 for readonly
and per-VPS agent keys), and PR #16 added a per-user LLM rate limit
(429 with Retry-After). The web UI silently routed both into a generic
red "Request failed" banner and still rendered the "Ask AI" button on
the VPS detail page for every authenticated role.

VpsDetail: gate the "Ask AI" button behind canMutate (admin|operator),
matching the other mutation actions on the page.

AiTriageDialog: branch on axios error status. 429 surfaces the
Retry-After header as "Try again in X minutes Y seconds" in an amber
panel (it's a recoverable hint, not a hard error). 403 renders a
friendly "your role doesn't permit AI triage" amber panel — readonly
users can still reach the standalone /ai-triage page from the sidebar,
so the dialog still needs this branch. Other errors keep the existing
red banner.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant