fix(ai): per-provider timeout in fallback chain + raise overall cap by alex-mextner · Pull Request #96 · alex-mextner/ExpenseSyncBot

alex-mextner · 2026-06-04T10:15:58Z

Проблема

ИИ-агент падал с «❌ Ошибка AI. Попробуйте позже.» на легитимных многораундовых вопросах. Доказано по логам прода: вопрос стартовал в 09:17:59, тройной abort всех провайдеров в 09:18:59 — ровно через 60 секунд. Это срабатывание AGENT_TIMEOUT_MS, а не флакайность провайдеров.

Два отдельных бага:

Отравление fallback-цепочки общим signal'ом. Один AbortController (60с) оборачивал весь runAgentLoop (до 10 раундов), и его signal шёл во все вызовы aiStreamRound и во все провайдеры fallback (z.ai → Gemini → HF). Когда дедлайн срабатывал посреди раунда, signal оставался аборченным навсегда → каждый следующий провайдер падал с APIUserAbortError в ту же миллисекунду. Fallback бесполезен.
60с слишком жёстко. z.ai glm-5.1 ~27с/раунд; вопрос на 3 раунда = ~81с > 60с → abort.

Фикс

streaming.ts: каждый провайдер в fallback получает свежий AbortController (PER_PROVIDER_TIMEOUT_MS = 45_000), скомбинированный с общим options.signal через AbortSignal.any. Медленный провайдер аботится через 45с, а следующий пробуется с чистым signal. После падения проверка options.signal?.aborted различает: общий дедлайн → стоп цепочки + AbortError; таймаут провайдера → следующий провайдер.
agent.ts: AGENT_TIMEOUT_MS 60с → 180с. Контроллер вынесен из цикла ретраев (один дедлайн на все попытки). Общий abort теперь даёт юзеру «⏳ Время ожидания истекло», а не «❌ Ошибка AI».

Константы: 3×45с = 135с < 180с — один раунд успевает перебрать все три провайдера в рамках бюджета.

Тесты

TDD: fallback со свежим signal после таймаута провайдера (падал на старом коде — регрессия Problem A), стоп цепочки при общем дедлайне с классификацией AbortError, 3-раундовый вопрос завершается, застрявший прогон ограничен дедлайном с timeout-сообщением.

Полный сьют: 3412/3412 зелёные, tsc чисто.

🤖 Generated with Claude Code

The AI agent failed with "Ошибка AI" on legitimate multi-round questions because of two distinct problems: Problem A — shared abort signal poisoned the fallback chain. A single AbortController with a 60s deadline wrapped the entire agent loop, and the same signal was handed to every provider in aiStreamRound's fallback chain. Once the deadline fired mid-round, the signal stayed aborted, so z.ai → Gemini → HF all rejected instantly with the same abort — the fallback was useless. Problem B — the 60s overall budget was too tight. z.ai glm-5.1 takes ~27s per round; a 3-round query (~81s) exceeded 60s and aborted. Fixes: - Each provider in the fallback loop now gets its own fresh per-provider timeout (PER_PROVIDER_TIMEOUT_MS = 45s), combined with the caller's overall signal via AbortSignal.any. A slow provider aborts after 45s and the loop retries the next with a clean, non-aborted signal. - The loop distinguishes the two abort causes: if the caller's overall signal is aborted, it stops the chain and throws an AbortError-classified error; if only the per-provider timeout fired, it continues to the next provider. - Raise AGENT_TIMEOUT_MS to 180s and move the AbortController/timeout outside the retry loop so one deadline spans all retries — a truly-stuck run is bounded by the cap, not cap × attempts. - The overall-deadline abort is thrown as a plain Error with name='AbortError' so run()'s catch surfaces "Время ожидания истекло" instead of the generic "Ошибка AI". Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-04T10:16:22Z

🤖 Stage bot deployed! Test it: https://t.me/ExpenseSyncStageBot

Branch: fix/ai-agent-timeout-fallback @ c5f934a

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 19d2a6fed9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-04T10:18:08Z

+ * the next one with a clean signal. NOT shared across providers, so one stuck
+ * provider does not poison the fallback chain.
+ */
+const PER_PROVIDER_TIMEOUT_MS = 45_000;


Preserve longer caller budgets for active streams

This 45s default now applies to every aiStreamRound caller even when the caller explicitly allows longer streaming: for example, the deep advice flow passes maxTokens: 3000 with a 120s AbortSignal.timeout and notes that deep responses need more time (src/bot/commands/ask.ts:31-40,336-340,426-428). Because this timer is wall-clock rather than idle-based, an otherwise healthy provider that is still streaming after 45s is aborted; since text has already been emitted, the fallback path refuses to switch providers and the advice generation fails/retries instead of using its intended 120s budget. Consider making this cap caller-configurable or enforcing it only when no progress is made.

Useful? React with 👍 / 👎.

chatgpt-codex-connector Bot reviewed Jun 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ai): per-provider timeout in fallback chain + raise overall cap#96

fix(ai): per-provider timeout in fallback chain + raise overall cap#96
alex-mextner wants to merge 1 commit into
mainfrom
fix/ai-agent-timeout-fallback

alex-mextner commented Jun 4, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alex-mextner commented Jun 4, 2026

Проблема

Фикс

Тесты

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant