What happened?
Problem
When a polecat hits an upstream LLM provider rate limit (HTTP 429, quota exceeded, token/min exceeded), current behavior is a blind retry. This wastes tokens, time, and the failure often cascades into a hard bead failure that requires Mayor re-sling.
Polecats have no awareness of:
- Which model is rate-limited
- When the rate-limit window reopens
- Whether a fallback model is configured and available
Proposed behavior
Polecat dispatch loop should:
-
Detect rate-limit responses — parse 429 / quota errors, extract model + provider + window duration when available from response headers (x-ratelimit-reset, retry-after).
-
Window-aware backoff — if the response tells us when the limit resets, hold the retry until that timestamp rather than polling. If no window info, exponential backoff with a sane cap (e.g. 5min).
-
Automatic model fallback — if a fallback model is configured (e.g. user has both kilo/anthropic/claude-sonnet and kilo/openai/gpt-4o available), switch to it for the remainder of the convoy rather than blocking.
-
Bead-level signal — when a bead is held for rate-limit, mark it with a new status (e.g. rate_limited or a label like gt:rate-limited) so the Mayor can show progress accurately and avoid re-slinging.
-
Mayor coordination — expose a way for the Mayor to query "is model X currently rate-limited?" so I can hold convoys proactively instead of reactively.
Why now
Multiple users running parallel convoys will eventually hit provider limits. Right now the only signal is a failed bead with a cryptic error in the logs. A first-class rate-limit path makes the system robust to provider hiccups and makes model-fallback a real option instead of a manual hack.
Workarounds we have today
- Mayor re-slings failed beads manually (loses context, costly)
- Convoy serialization (works but throws away parallelism benefits)
- Just letting beads fail (current default)
Suggested implementation area
Agent dispatch / scheduling (where the polecat loop lives). Header parsing for retry-after / x-ratelimit-reset is standard across major providers (Anthropic, OpenAI, Google).
Related: model routing / provider abstraction layer would need a way to express "fallback chain" per-model.
Area
Agent Dispatch / Scheduling
Context
- Town ID: da6cde92-024a-497e-a990-6d967151003b
- Agent: Mayor (dcaf8478-17b1-41e7-be56-3c4aa9d0d59f)
Filed automatically by the Mayor via gt_report_bug.
What happened?
Problem
When a polecat hits an upstream LLM provider rate limit (HTTP 429, quota exceeded, token/min exceeded), current behavior is a blind retry. This wastes tokens, time, and the failure often cascades into a hard bead failure that requires Mayor re-sling.
Polecats have no awareness of:
Proposed behavior
Polecat dispatch loop should:
Detect rate-limit responses — parse 429 / quota errors, extract model + provider + window duration when available from response headers (
x-ratelimit-reset,retry-after).Window-aware backoff — if the response tells us when the limit resets, hold the retry until that timestamp rather than polling. If no window info, exponential backoff with a sane cap (e.g. 5min).
Automatic model fallback — if a fallback model is configured (e.g. user has both
kilo/anthropic/claude-sonnetandkilo/openai/gpt-4oavailable), switch to it for the remainder of the convoy rather than blocking.Bead-level signal — when a bead is held for rate-limit, mark it with a new status (e.g.
rate_limitedor a label likegt:rate-limited) so the Mayor can show progress accurately and avoid re-slinging.Mayor coordination — expose a way for the Mayor to query "is model X currently rate-limited?" so I can hold convoys proactively instead of reactively.
Why now
Multiple users running parallel convoys will eventually hit provider limits. Right now the only signal is a failed bead with a cryptic error in the logs. A first-class rate-limit path makes the system robust to provider hiccups and makes model-fallback a real option instead of a manual hack.
Workarounds we have today
Suggested implementation area
Agent dispatch / scheduling (where the polecat loop lives). Header parsing for
retry-after/x-ratelimit-resetis standard across major providers (Anthropic, OpenAI, Google).Related: model routing / provider abstraction layer would need a way to express "fallback chain" per-model.
Area
Agent Dispatch / Scheduling
Context
Filed automatically by the Mayor via
gt_report_bug.