[Gastown] Feature: polecat rate-limit awareness with model fallback and window-aware backoff

## What happened?

## Problem

When a polecat hits an upstream LLM provider rate limit (HTTP 429, quota exceeded, token/min exceeded), current behavior is a blind retry. This wastes tokens, time, and the failure often cascades into a hard bead failure that requires Mayor re-sling.

Polecats have no awareness of:
- Which model is rate-limited
- When the rate-limit window reopens
- Whether a fallback model is configured and available

## Proposed behavior

Polecat dispatch loop should:

1. **Detect rate-limit responses** — parse 429 / quota errors, extract model + provider + window duration when available from response headers (`x-ratelimit-reset`, `retry-after`).

2. **Window-aware backoff** — if the response tells us when the limit resets, hold the retry until that timestamp rather than polling. If no window info, exponential backoff with a sane cap (e.g. 5min).

3. **Automatic model fallback** — if a fallback model is configured (e.g. user has both `kilo/anthropic/claude-sonnet` and `kilo/openai/gpt-4o` available), switch to it for the remainder of the convoy rather than blocking.

4. **Bead-level signal** — when a bead is held for rate-limit, mark it with a new status (e.g. `rate_limited` or a label like `gt:rate-limited`) so the Mayor can show progress accurately and avoid re-slinging.

5. **Mayor coordination** — expose a way for the Mayor to query "is model X currently rate-limited?" so I can hold convoys proactively instead of reactively.

## Why now

Multiple users running parallel convoys will eventually hit provider limits. Right now the only signal is a failed bead with a cryptic error in the logs. A first-class rate-limit path makes the system robust to provider hiccups and makes model-fallback a real option instead of a manual hack.

## Workarounds we have today

- Mayor re-slings failed beads manually (loses context, costly)
- Convoy serialization (works but throws away parallelism benefits)
- Just letting beads fail (current default)

## Suggested implementation area

Agent dispatch / scheduling (where the polecat loop lives). Header parsing for `retry-after` / `x-ratelimit-reset` is standard across major providers (Anthropic, OpenAI, Google).

Related: model routing / provider abstraction layer would need a way to express "fallback chain" per-model.

## Area

Agent Dispatch / Scheduling

## Context

- **Town ID:** da6cde92-024a-497e-a990-6d967151003b
- **Agent:** Mayor (dcaf8478-17b1-41e7-be56-3c4aa9d0d59f)

*Filed automatically by the Mayor via `gt_report_bug`.*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Gastown] Feature: polecat rate-limit awareness with model fallback and window-aware backoff #4057

What happened?

Problem

Proposed behavior

Why now

Workarounds we have today

Suggested implementation area

Area

Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Gastown] Feature: polecat rate-limit awareness with model fallback and window-aware backoff #4057

Description

What happened?

Problem

Proposed behavior

Why now

Workarounds we have today

Suggested implementation area

Area

Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions