Skip to content

fix(proxy): 加固 AgentPool 清理与透传 stats#762

Merged
ding113 merged 6 commits intoding113:devfrom
tesgth032:fix/hang-stuck-requesting-hardening
Feb 14, 2026
Merged

fix(proxy): 加固 AgentPool 清理与透传 stats#762
ding113 merged 6 commits intoding113:devfrom
tesgth032:fix/hang-stuck-requesting-hardening

Conversation

@tesgth032
Copy link
Contributor

@tesgth032 tesgth032 commented Feb 11, 2026

概要

作为 #759(AgentPool 驱逐阻塞导致全局 requesting 卡死)的后续加固,本 PR 主要做三件事:

  1. Gemini passthrough stats:将响应体统计缓冲从“仅尾部窗口”升级为“头(1MB)+尾(9MB)”双窗口,并在截断时插入 SSE comment 分隔,避免跨事件拼接。
  2. SSE 判断:usage 解析从 responseText.includes("data:") 改为 isSSEText(),避免 JSON 中包含 "data:" 的误判,避免干扰后续 JSON 修复/解析链路。
  3. AgentPool 清理:为 fire-and-forget 的 destroy/close 增加 pendingCleanups 追踪与 shutdown best-effort 等待(2s cap + 60s drop-ref),既避免全局阻塞,又尽量优雅收尾/防内存泄漏。

关键点

  • idle watchdog 仅在收到首块非空 body chunk 后启动(避免慢启动 / 0-byte chunk 误杀);首字节超时仍由 firstByteTimeoutStreamingMs(若配置)兜底。
  • 仅影响内部统计与资源管理,不改变透传给客户端的响应内容。

变更文件

  • src/app/v1/_lib/proxy/response-handler.ts
  • src/lib/proxy-agent/agent-pool.ts
  • tests/unit/lib/provider-endpoints/probe.test.ts
  • tests/unit/lib/proxy-agent/agent-pool.test.ts

测试

  • npm test
  • npm run typecheck

Greptile Overview

Greptile Summary

This PR hardens the AgentPool cleanup lifecycle and upgrades the Gemini passthrough stats buffering strategy.

  • Head+tail stats buffering: The response body stats buffer is upgraded from a single tail-window to a dual-window (1 MB head + 9 MB tail) design, with an SSE comment marker (: [cch_truncated]) inserted at the truncation boundary to prevent cross-event misparsing when head and tail are joined.
  • SSE detection fix: Usage parsing now uses isSSEText() (checks line-start patterns) instead of a naive responseText.includes("data:"), preventing false positives from JSON bodies containing "data:".
  • pendingCleanups tracking: Fire-and-forget destroy()/close() calls are now tracked in a Set<Promise<void>> with a 60-second drop-ref timeout, allowing shutdown() to do a best-effort 2-second wait for graceful cleanup without ever blocking indefinitely.
  • Test hardening: The shutdown test now verifies that destroy/close are actually mock functions before asserting call behavior, catching mock setup regressions early.

Confidence Score: 3/5

  • The PR is generally safe for internal stats and resource management, but the known UTF-8 byte-accounting inaccuracy at the head/tail split boundary could cause subtle stats misattribution under certain conditions.
  • Score of 3 reflects that the core changes (AgentPool cleanup, SSE detection) are solid and well-motivated, but the head/tail buffer split has a known UTF-8 byte-accounting issue that could cause slightly inaccurate buffer boundary tracking. Since this only affects internal stats (not client-facing data), severity is moderate.
  • Pay close attention to src/app/v1/_lib/proxy/response-handler.ts — specifically the head/tail chunk-splitting logic around lines 1158-1174 where UTF-8 multi-byte sequence boundaries interact with byte accounting.

Important Files Changed

Filename Overview
src/app/v1/_lib/proxy/response-handler.ts Upgrades stats buffering from tail-only to head+tail window with SSE truncation marker, and replaces naive includes("data:") with isSSEText(). The UTF-8 byte-accounting issue at the head/tail split boundary (already discussed) remains the primary concern.
src/lib/proxy-agent/agent-pool.ts Adds pendingCleanups tracking for fire-and-forget destroy/close promises, with a 60s drop-ref safety net and 2s best-effort shutdown wait. Well-structured; minor benign race in cleanup tracking already noted.
tests/unit/lib/proxy-agent/agent-pool.test.ts Adds mock-verification assertions to the shutdown test to confirm destroy/close are vi.fn mocks before asserting call behavior. Straightforward improvement.

Flowchart

flowchart TD
    A[Upstream chunk arrives] --> B{value && byteLength > 0?}
    B -->|No| I[Skip to idle timer check]
    B -->|Yes| C{!inTailMode && headBytes < 1MB?}
    C -->|Yes| D{chunkSize > remainingHeadBytes?}
    C -->|No| H[pushChunk to tail]
    D -->|Yes| E[Split: headPart + tailPart]
    D -->|No| F[pushChunk whole to head]
    E --> F2[pushChunk headText to head]
    F2 --> G[pushChunk tailText to tail]
    G --> I
    F --> I
    H --> I
    I --> J{!isFirstChunk?}
    J -->|Yes| K[startIdleTimer]
    J -->|No| L[Continue loop]
    K --> L

    subgraph shutdown[AgentPool Shutdown]
        S1[closeAgent fire-and-forget] --> S2[Add to pendingCleanups]
        S2 --> S3[60s dropRef timeout]
        S3 --> S4{shutdown called?}
        S4 -->|Yes| S5[Promise.race: allSettled vs 2s timeout]
        S5 --> S6[pendingCleanups.clear]
    end
Loading

Last reviewed commit: fcc1716

@coderabbitai
Copy link

coderabbitai bot commented Feb 11, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

将响应流统计从单一缓冲重构为头/尾滑动窗口,改写流式推送与拼接逻辑并引入 SSE 检测改动与用量解析调整;为代理池引入非阻塞的待办清理追踪与超时删除;更新若干单元测试的 mock 与断言以配合新行为。

Changes

Cohort / File(s) Summary
流式响应处理与缓冲
src/app/v1/_lib/proxy/response-handler.ts
将流统计从单一缓冲改为 head/tail 窗口(新增 MAX_STATS_HEAD_BYTES / MAX_STATS_TAIL_BYTES / MAX_STATS_TAIL_CHUNKS、headChunks/tailChunks、inTailMode/wasTruncated);重写 pushChunk/joinChunks/joinTailChunks,优先填充 head 并在截断时插入 SSE-safe 标记;更新 idle/first-byte 超时日志以暴露 head/tail 指标;改用 isSSEText(responseText) 驱动 SSE 用量解析,调整 Claude/Gemini 用量合并与最终化路径。
代理池清理生命周期
src/lib/proxy-agent/agent-pool.ts
AgentPoolImpl 中新增私有字段 pendingCleanups: Set<Promise<void>>;将 agent 驱逐/关闭改为 fire-and-forget 且 best-effort 跟踪清理 promise;shutdown() 使用 allSettled 等待现有 cache 条目的 close,并在存在未决清理时最多等待 2000ms;为跟踪 promise 添加 60000ms 的 drop-ref 超时与完成后移除逻辑。
测试模拟与断言更新
tests/unit/lib/provider-endpoints/probe.test.ts, tests/unit/lib/proxy-agent/agent-pool.test.ts
在 provider-endpoints 测试的 endpoint-circuit-breaker mock 中加入 getEndpointCircuitStateSyncresetEndpointCircuit;在 agent-pool 测试新增断言以确保 agent.destroy/agent.close 为 mock 函数并验证在 shutdown 中优先使用 destroy(避免挂起的 close)。

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed 标题与变更集内容相关联,指出了主要改动是加固 AgentPool 清理和 stats 透传,与 PR 中的核心变更(AgentPool 清理跟踪和 stats 缓冲重设计)相符。
Description check ✅ Passed PR 描述详细说明了三个主要改进:Gemini 透传统计缓冲升级、SSE 判断优化和 AgentPool 清理追踪,与变更内容高度相关。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @tesgth032, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求旨在增强系统的健壮性和稳定性,通过优化 AgentPool 的清理流程,确保在服务关闭时能够更优雅地处理挂起的清理任务,同时避免潜在的内存泄漏。此外,它还改进了 Gemini 代理的统计数据透传机制,通过更智能的缓冲策略和 SSE 文本判断,有效规避了处理超大响应体时可能出现的内存溢出和数据解析错误,从而提升了系统的防御能力和数据处理的准确性。

Highlights

  • AgentPool 清理加固: 优化了 AgentPool 的清理机制,使其在 shutdown 场景下能更可控地进行非阻塞清理,并避免因永不 settle 的 promise 导致的内存泄漏。
  • Gemini 透传 stats 防御: 改进了 stats 缓冲策略,从“仅尾部窗口”升级为“头部(1MB) + 尾部(9MB)”,并增加了截断时的 SSE comment 标记,以避免大响应体带来的内存/解析风险和 SSE/JSON 判断误判。
  • 单测修复: 修复了 probe.test.ts 中的单元测试,补齐了 circuit breaker mock 的新导出,以适应 dev 分支的基线。
Changelog
  • src/app/v1/_lib/proxy/response-handler.ts
    • 引入了 isSSEText 函数以改进 SSE 文本的判断逻辑。
    • 将 stats 缓冲策略从单一尾部窗口调整为头部(1MB)加尾部(9MB)的双窗口模式。
    • 在响应体被截断时,插入了空行和 SSE comment (: [cch_truncated]),以防止头部和尾部数据拼接时跨事件的误判。
    • 更新了 parseUsageFromResponseText 函数,使用 isSSEText 进行 SSE 文本检测,减少误判。
    • 调整了空闲计时器的启动逻辑,使其在未配置首字节超时的情况下也能作为兜底机制启动。
    • 更新了日志中 bufferedBytes 的统计方式,以反映新的头部和尾部缓冲。
  • src/lib/proxy-agent/agent-pool.ts
    • 新增了 pendingCleanups Set,用于跟踪正在进行的 agent 销毁/关闭 promise。
    • 修改了 shutdown() 方法,使其在清理时会尽力等待 pendingCleanups 中的任务完成,但设置了 2 秒的超时限制以避免无限等待。
    • destroyclose 操作创建的清理 promise 添加了跟踪机制,并在 60 秒后自动从 pendingCleanups 中移除引用,以防止永不 settle 的 promise 导致内存泄漏。
  • tests/unit/lib/provider-endpoints/probe.test.ts
    • endpoint-circuit-breaker 模块的 mock 中,新增了 getEndpointCircuitStateSyncresetEndpointCircuit 的模拟实现,以修复单元测试。
  • tests/unit/lib/proxy-agent/agent-pool.test.ts
    • 增加了对 agent.destroyagent.close 方法是否为 mock 函数的断言,以确保测试的正确性。
Activity
  • 此拉取请求是在 PR fix(proxy): 修复请求卡死(AgentPool 驱逐阻塞) #759 合并到 dev 分支后,根据多方 AI review 的反馈,为进一步增强系统稳定性和防御能力而提出的补充性变更。
  • 作者提供了详细的验证步骤,包括运行 npm run typechecknpm test,以确保代码质量和功能正确性。
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions github-actions bot added bug Something isn't working area:core size/S Small PR (< 200 lines) labels Feb 11, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

本次 PR 对 AgentPool 的清理机制和 Gemini stats 的透传逻辑进行了加固,显著提升了系统的稳定性和鲁棒性。AgentPool 的 shutdown 流程通过引入带超时的 best-effort 等待,以及对悬挂 promise 的内存泄漏防御,变得更加健壮。Gemini 大响应体的处理方式升级为“头+尾”窗口,并增加了截断标记,是很好的内存优化和防御性设计。单测的修复也保证了代码质量。整体来看,这是一次高质量的改进。我只在 AgentPoolshutdown 方法中发现一个可以优化的地方,即将串行关闭 Agent 的逻辑改回并行处理,以提高效率和代码的惯用性。

Comment on lines 341 to 343
for (const [key, cached] of this.cache.entries()) {
closePromises.push(this.closeAgent(cached.agent, key));
await this.closeAgent(cached.agent, key);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

关闭 agent 的循环由于在 for...of 循环内部使用了 await,现在是串行执行。虽然 closeAgent 函数的设计执行很快,但使用 Promise.all 并行执行这些调用更符合惯例,也稍微更高效。可以考虑使用 Array.frommap 来更简洁地实现并行处理。

Suggested change
for (const [key, cached] of this.cache.entries()) {
closePromises.push(this.closeAgent(cached.agent, key));
await this.closeAgent(cached.agent, key);
}
await Promise.all(
Array.from(this.cache.entries()).map(([key, cached]) =>
this.closeAgent(cached.agent, key)
)
);

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +1070 to 1079
// 优先填充 head;超过 head 上限后切到 tail(但不代表一定发生截断,只有 tail 溢出才算截断
if (!inTailMode) {
if (headBufferedBytes + bytes <= MAX_STATS_HEAD_BYTES) {
headChunks.push(text);
headBufferedBytes += bytes;
return;
}

inTailMode = true;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Head window drops bytes

In passthrough stats buffering, when headBufferedBytes + bytes would exceed MAX_STATS_HEAD_BYTES, the code switches to tail mode and drops the current chunk entirely from the head, even though part of it could still fit. This creates an unintended gap between head and tail (more truncation than necessary) and can break parsing if important metadata starts near the end of the head window. Consider splitting the chunk (take remaining head capacity, then send the rest to tail) so the head window is truly “first N bytes.”

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/response-handler.ts
Line: 1070:1079

Comment:
**Head window drops bytes**

In passthrough stats buffering, when `headBufferedBytes + bytes` would exceed `MAX_STATS_HEAD_BYTES`, the code switches to tail mode and **drops the current chunk entirely from the head**, even though part of it could still fit. This creates an unintended gap between head and tail (more truncation than necessary) and can break parsing if important metadata starts near the end of the head window. Consider splitting the chunk (take remaining head capacity, then send the rest to tail) so the head window is truly “first N bytes.”

How can I resolve this? If you propose a fix, please make it concise.

@greptile-apps
Copy link

greptile-apps bot commented Feb 11, 2026

Additional Comments (1)

src/lib/proxy-agent/agent-pool.ts
Shutdown can hang

shutdown() now does await this.closeAgent(...) for each cached entry, but closeAgent() is explicitly fire-and-forget and does not await the underlying destroy()/close() promise. This means pendingCleanups won’t be populated in time for the subsequent best-effort wait, and shutdown() will return immediately without actually waiting for any cleanup. If the intent is to best-effort wait during shutdown, closeAgent() needs a mode that returns/awaits the cleanup promise (or shutdown() should directly enqueue/track promises without awaiting closeAgent()).

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/proxy-agent/agent-pool.ts
Line: 335:343

Comment:
**Shutdown can hang**

`shutdown()` now does `await this.closeAgent(...)` for each cached entry, but `closeAgent()` is explicitly fire-and-forget and **does not await** the underlying `destroy()/close()` promise. This means `pendingCleanups` won’t be populated in time for the subsequent best-effort wait, and `shutdown()` will return immediately without actually waiting for any cleanup. If the intent is to best-effort wait during shutdown, `closeAgent()` needs a mode that returns/awaits the cleanup promise (or `shutdown()` should directly enqueue/track promises without awaiting `closeAgent()`).

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

No significant issues identified in this PR. The changes are well-structured defensive hardening for the AgentPool cleanup lifecycle and the passthrough stats buffering strategy. The fire-and-forget cleanup with pendingCleanups tracking, the head+tail buffer split with SSE-safe truncation markers, and the isSSEText() adoption for more accurate SSE detection are all sound approaches.

PR Size: S

  • Lines changed: 230 (174 additions, 56 deletions)
  • Files changed: 4

Review Coverage

  • Logic and correctness - Clean
  • Security (OWASP Top 10) - Clean
  • Error handling - Clean
  • Type safety - Clean
  • Documentation accuracy - Clean
  • Test coverage - Adequate
  • Code clarity - Good

Notes

AgentPool (agent-pool.ts):

  • pendingCleanups tracking with 60s stale-reference timeout and 2s best-effort shutdown wait is a solid pattern for handling never-settling dispatchers.
  • The trackedPromise self-removal in .finally() correctly leverages closure timing (assigned before the callback executes asynchronously).

Response handler (response-handler.ts):

  • Head (1MB) + tail (9MB) buffer split with \n\n: [cch_truncated]\n\n separator correctly prevents cross-event splicing when head and tail are joined, since parseSSEData ignores : comment lines and empty lines flush events.
  • isSSEText() replacing responseText.includes("data:") eliminates false positives from JSON payloads containing the substring "data:".
  • hasFirstByteTimeout guard on idle timer startup correctly ensures at least one timeout is always active.

Tests:

  • Probe test mock additions (getEndpointCircuitStateSync, resetEndpointCircuit) are mechanical fixes for new exports on dev baseline.
  • Agent pool test additions validate mock function integrity before asserting on call behavior.

Automated review by Claude AI

@tesgth032 tesgth032 force-pushed the fix/hang-stuck-requesting-hardening branch from c2240ca to 0e5b662 Compare February 11, 2026 05:41
@tesgth032
Copy link
Contributor Author

已根据本 PR 的 AI review 补齐两个关键边界点:

esponse-handler.ts 的 head+tail stats 缓冲:在 head 边界的 chunk 按 byte 切分,保证 head 真的是“前 1MB”,避免整块直接落到 tail 导致 head 出现不必要的空洞。
2) �gent-pool.ts 的 shutdown():并行触发 closeAgent()(Promise.allSettled),确保 pendingCleanups 在 best-effort wait 前已完成填充,同时保持驱逐/清理路径不阻塞(避免再出现全局 requesting 卡死)。

本地已验证:
pm test、
pm run typecheck 通过。

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +1194 to +1198
const headText = decoder.decode(headPart, { stream: true });
pushChunk(headText, remainingHeadBytes);

const tailText = decoder.decode(tailPart, { stream: true });
pushChunk(tailText, chunkSize - remainingHeadBytes);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitting a chunk and calling decode() twice sequentially on the same decoder maintains UTF-8 multi-byte sequence integrity because stream: true preserves incomplete sequences between calls - this is correct. However, pushChunk(headText, remainingHeadBytes) on line 1195 passes the original byte count (remainingHeadBytes), but headText may have fewer characters if the split occurred mid-UTF-8-sequence (decoder will withhold those bytes until the next call). This means headBufferedBytes could become incorrect, potentially under-filling the head window.

Consider tracking the actual decoded byte length or adjusting the accounting to handle incomplete sequences.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/response-handler.ts
Line: 1194:1198

Comment:
Splitting a chunk and calling `decode()` twice sequentially on the same decoder maintains UTF-8 multi-byte sequence integrity because `stream: true` preserves incomplete sequences between calls - this is correct. However, `pushChunk(headText, remainingHeadBytes)` on line 1195 passes the original **byte** count (`remainingHeadBytes`), but `headText` may have **fewer characters** if the split occurred mid-UTF-8-sequence (decoder will withhold those bytes until the next call). This means `headBufferedBytes` could become incorrect, potentially under-filling the head window.

Consider tracking the actual decoded byte length or adjusting the accounting to handle incomplete sequences.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +352 to +358
await Promise.race([
Promise.allSettled(pending).then(() => {}),
new Promise<void>((resolve) => {
timeoutId = setTimeout(resolve, WAIT_MS);
timeoutId.unref();
}),
]);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition: if a cleanup promise settles and removes itself from pendingCleanups (line 421) after line 348 creates the array snapshot but before the Promise.race starts, that promise is still included in pending but no longer tracked. When it later settles during the race, line 421 tries to delete from an already-cleared set (line 363 clears after race completes), causing a harmless but unintended delete operation on an empty set.

Low severity - functionally safe, but the timing window could be tightened by checking this.pendingCleanups.size again after the race or accepting this benign race.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/proxy-agent/agent-pool.ts
Line: 352:358

Comment:
Race condition: if a cleanup promise settles and removes itself from `pendingCleanups` (line 421) **after** line 348 creates the array snapshot but **before** the `Promise.race` starts, that promise is still included in `pending` but no longer tracked. When it later settles during the race, line 421 tries to delete from an already-cleared set (line 363 clears after race completes), causing a harmless but unintended delete operation on an empty set.

Low severity - functionally safe, but the timing window could be tightened by checking `this.pendingCleanups.size` again after the race or accepting this benign race.

How can I resolve this? If you propose a fix, please make it concise.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/app/v1/_lib/proxy/response-handler.ts`:
- Around line 1110-1114: The chunksCollected value currently only counts tail
chunks via Math.max(0, tailChunks.length - tailHead) and therefore omits head
chunks; update the calculation in response-handler.ts (the code assigning
chunksCollected) to include headChunks.length (i.e., headChunks.length +
Math.max(0, tailChunks.length - tailHead)) so chunksCollected reports the total
number of collected chunks, referencing the existing variables headChunks,
tailChunks, and tailHead.
🧹 Nitpick comments (1)
src/lib/proxy-agent/agent-pool.ts (1)

403-431: cleanup promise 追踪与 60s 安全超时机制实现正确。

几点确认:

  • trackedPromise.finally() 闭包中引用自身是安全的——回调异步执行时变量已完成赋值。
  • 60s setTimeout + unref() 兜底确保即使 dispatcher 永不 settle 也不会造成内存泄漏。
  • pendingCleanups.delete 对不存在的元素是 no-op,与 shutdown() 中的 .clear() 不冲突。

一个小的风格建议:trackedPromise 可以用 const 声明(闭包捕获的是引用,异步回调执行时变量已初始化),语义更精确。

可选:let → const
-      let trackedPromise: Promise<void>;
-      let dropRefTimeoutId: NodeJS.Timeout | null = null;
-
-      trackedPromise = cleanupPromise
+      let dropRefTimeoutId: NodeJS.Timeout | null = null;
+
+      const trackedPromise: Promise<void> = cleanupPromise

@tesgth032
Copy link
Contributor Author

已按 CodeRabbit 的建议更新:

  • chunksCollected 现在包含 headChunks.length(head+tail 总数)
  • rackedPromise 改为 const(语义更准确)

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +1194 to +1195
const headText = decoder.decode(headPart, { stream: true });
pushChunk(headText, remainingHeadBytes);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When splitting at a UTF-8 multi-byte sequence boundary, decoder.decode(headPart, { stream: true }) may withhold incomplete bytes (e.g., 2 bytes of a 3-byte character). The decoder returns text representing fewer bytes than remainingHeadBytes, but pushChunk is told the full remainingHeadBytes count. This causes headBufferedBytes to overcount by 1-3 bytes, triggering premature switch to tail mode and under-filling the head buffer.

Calculate actual decoded byte length:

Suggested change
const headText = decoder.decode(headPart, { stream: true });
pushChunk(headText, remainingHeadBytes);
const headText = decoder.decode(headPart, { stream: true });
const actualHeadBytes = new TextEncoder().encode(headText).byteLength;
pushChunk(headText, actualHeadBytes);
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/response-handler.ts
Line: 1194:1195

Comment:
When splitting at a UTF-8 multi-byte sequence boundary, `decoder.decode(headPart, { stream: true })` may withhold incomplete bytes (e.g., 2 bytes of a 3-byte character). The decoder returns text representing fewer bytes than `remainingHeadBytes`, but `pushChunk` is told the full `remainingHeadBytes` count. This causes `headBufferedBytes` to overcount by 1-3 bytes, triggering premature switch to tail mode and under-filling the head buffer.

Calculate actual decoded byte length:
```suggestion
                    const headText = decoder.decode(headPart, { stream: true });
                    const actualHeadBytes = new TextEncoder().encode(headText).byteLength;
                    pushChunk(headText, actualHeadBytes);
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +1197 to +1198
const tailText = decoder.decode(tailPart, { stream: true });
pushChunk(tailText, chunkSize - remainingHeadBytes);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same byte accounting issue: chunkSize - remainingHeadBytes represents the tail part's byte count, but if UTF-8 bytes were withheld from the head decode, those bytes are prepended internally by the decoder when decoding tailPart. The decoded tailText will contain more characters than chunkSize - remainingHeadBytes would suggest, leading to inaccurate byte tracking.

Suggested change
const tailText = decoder.decode(tailPart, { stream: true });
pushChunk(tailText, chunkSize - remainingHeadBytes);
const tailText = decoder.decode(tailPart, { stream: true });
const actualTailBytes = new TextEncoder().encode(tailText).byteLength;
pushChunk(tailText, actualTailBytes);
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/response-handler.ts
Line: 1197:1198

Comment:
Same byte accounting issue: `chunkSize - remainingHeadBytes` represents the tail part's byte count, but if UTF-8 bytes were withheld from the head decode, those bytes are prepended internally by the decoder when decoding `tailPart`. The decoded `tailText` will contain more characters than `chunkSize - remainingHeadBytes` would suggest, leading to inaccurate byte tracking.

```suggestion
                    const tailText = decoder.decode(tailPart, { stream: true });
                    const actualTailBytes = new TextEncoder().encode(tailText).byteLength;
                    pushChunk(tailText, actualTailBytes);
```

How can I resolve this? If you propose a fix, please make it concise.

// 首块数据到达后才启动 idle timer(避免与首字节超时职责重叠)
if (!isFirstChunk) {
// 若未配置首字节超时,则 done=false 后也启动 idle timer 作为兜底(包含 0-byte chunk/value=undefined)
if (!isFirstChunk || !hasFirstByteTimeout) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idle timer now starts even before first chunk arrives when !hasFirstByteTimeout. This means for streams with 0-byte chunks or value=undefined, the idle timer could fire before any real data arrives, potentially aborting legitimate slow-starting streams. Consider whether the idle timer should truly be a fallback for missing first-byte timeout, or if it should only start after actual data (non-empty value).

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/response-handler.ts
Line: 1209:1209

Comment:
Idle timer now starts even before first chunk arrives when `!hasFirstByteTimeout`. This means for streams with 0-byte chunks or `value=undefined`, the idle timer could fire before any real data arrives, potentially aborting legitimate slow-starting streams. Consider whether the idle timer should truly be a fallback for missing first-byte timeout, or if it should only start after actual data (non-empty `value`).

How can I resolve this? If you propose a fix, please make it concise.

// 1. 标准 SSE (event: + data:) - Claude/OpenAI
// 2. 纯 data: 格式 - Gemini
if (!usageMetrics && responseText.includes("data:")) {
if (!usageMetrics && isSSEText(responseText)) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced naive responseText.includes("data:") with isSSEText() utility for more accurate SSE format detection. This prevents false positives from JSON bodies containing the string "data:" and correctly identifies SSE by checking line-start patterns.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/response-handler.ts
Line: 2311:2311

Comment:
Replaced naive `responseText.includes("data:")` with `isSSEText()` utility for more accurate SSE format detection. This prevents false positives from JSON bodies containing the string "data:" and correctly identifies SSE by checking line-start patterns.

How can I resolve this? If you propose a fix, please make it concise.

closePromises.push(this.closeAgent(cached.agent, key));
// Best-effort:等待部分 pending cleanup 完成,但永不无限等待(避免重蹈 “close() 等待 in-flight” 的覆辙)
if (this.pendingCleanups.size > 0) {
const pending = Array.from(this.pendingCleanups);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timing window: if a cleanup promise settles between this snapshot (line 348) and when Promise.race starts (line 352), its .finally() handler (line 420) will try to delete from pendingCleanups even though it's already in the pending array. When the race completes and line 363 clears the set, that promise's delete becomes a no-op on an empty set. Benign but worth noting.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/proxy-agent/agent-pool.ts
Line: 348:348

Comment:
Timing window: if a cleanup promise settles between this snapshot (line 348) and when `Promise.race` starts (line 352), its `.finally()` handler (line 420) will try to delete from `pendingCleanups` even though it's already in the `pending` array. When the race completes and line 363 clears the set, that promise's delete becomes a no-op on an empty set. Benign but worth noting.

How can I resolve this? If you propose a fix, please make it concise.

- AgentPool: close/destroy 走 fire-and-forget,并追踪 pending,shutdown best-effort

- Gemini passthrough stats: 头+尾缓冲、截断 flush 边界、idle watchdog 兜底

- Usage SSE 检测改用 isSSEText,避免误判干扰 JSON 修复链路

- 补强 AgentPool 单测对 undici mock 的断言
- 增加 getEndpointCircuitStateSync/resetEndpointCircuit,避免新增导出导致 vitest mock 访问时报错
@tesgth032 tesgth032 force-pushed the fix/hang-stuck-requesting-hardening branch from c91ee3f to a172ad8 Compare February 11, 2026 07:16
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/app/v1/_lib/proxy/response-handler.ts`:
- Around line 1178-1182: The idle timer is being started before the first chunk
when hasFirstByteTimeout is false, causing premature termination for
slow-starting streams; update the logic around the reader.read loop where
isFirstChunk, hasFirstByteTimeout and startIdleTimer are used so the idle timer
is only started after the first chunk arrives (i.e., require !isFirstChunk to
start) or, if you want a fallback, introduce a separate firstByteIdleFallbackMs
and start a dedicated first-byte fallback timer before the first chunk that is
wider than streamingIdleTimeoutMs and only begin the regular startIdleTimer()
after the first chunk; change the condition that currently reads (!isFirstChunk
|| !hasFirstByteTimeout) to either (!isFirstChunk) or to use the fallback path
when hasFirstByteTimeout is false and reference isFirstChunk,
hasFirstByteTimeout, startIdleTimer, streamingIdleTimeoutMs and
firstByteTimeoutStreamingMs when making the change.
🧹 Nitpick comments (4)
tests/unit/lib/provider-endpoints/probe.test.ts (1)

52-56: Mock 设置重复度极高,建议抽取为辅助函数。

@/lib/endpoint-circuit-breaker 的 mock 在所有 10 个测试用例中完全一致(包括 logger@/repository 的 mock 也高度雷同)。可以考虑抽取一个 setupCommonMocks() 辅助函数,后续新增字段只需改一处。

示例
function mockCircuitBreaker() {
  return {
    getEndpointCircuitStateSync: vi.fn(() => "closed"),
    recordEndpointFailure: vi.fn(async () => {}),
    resetEndpointCircuit: vi.fn(async () => {}),
  };
}
// 在每个测试中:
vi.doMock("@/lib/endpoint-circuit-breaker", () => mockCircuitBreaker());
src/lib/proxy-agent/agent-pool.ts (1)

335-370: shutdown() 重构后行为正确,有一个小细节值得注意。

由于 shutdown() 改为直接调用 closeAgent + cache.clear()(绕过了 evictByKey),stats.evictedAgents 在关闭期间不会递增。如果有监控依赖该计数器来确认关闭期间清理了多少 agent,会出现偏差。不过既然是 shutdown 场景,通常不影响实际使用。

src/app/v1/_lib/proxy/response-handler.ts (2)

1157-1175: UTF-8 多字节字符在 head 边界处切分可能导致 head 略少于 1 MB。

value.subarray(0, remainingHeadBytes) 可能在多字节 UTF-8 字符中间切断。decoder.decode(headPart, { stream: true }) 会将不完整的尾部字节缓存在 decoder 内部状态中,这些字节会被追加到后续 decoder.decode(tailPart, { stream: true }) 的输出。因此:

  1. headText 可能比预期少几个字节的文本内容(最多 3 bytes/次切分)。
  2. pushChunk(headText, remainingHeadBytes)bytes 参数略微高估了实际文本字节数。

对统计/结算来说这个偏差可以忽略(head 上限本身就是近似值),但如果需要精确,可以用 TextEncoder 做字节级修正:

可选修正:精确字节计数
                    const headText = decoder.decode(headPart, { stream: true });
-                   pushChunk(headText, remainingHeadBytes);
+                   const actualHeadBytes = new TextEncoder().encode(headText).byteLength;
+                   pushChunk(headText, actualHeadBytes);

                    const tailText = decoder.decode(tailPart, { stream: true });
-                   pushChunk(tailText, chunkSize - remainingHeadBytes);
+                   pushChunk(tailText, chunkSize - actualHeadBytes);

1437-1440: 非 Gemini 流路径仍使用无界 chunks 数组,缺少 head/tail 缓冲保护。

Gemini passthrough 已经实现了 head/tail 双窗口缓冲来防止 OOM,但非 Gemini 流路径(line 1440 const chunks: string[] = [])仍然无限累积所有 chunk。对于超大响应,这同样存在内存风险。

建议后续将 head/tail 缓冲逻辑提取为可复用的工具类,在两条路径中统一使用。

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +1164 to +1168
const headText = decoder.decode(headPart, { stream: true });
pushChunk(headText, remainingHeadBytes);

const tailText = decoder.decode(tailPart, { stream: true });
pushChunk(tailText, chunkSize - remainingHeadBytes);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UTF-8 byte accounting wrong

In the head/tail split path, pushChunk(headText, remainingHeadBytes) and pushChunk(tailText, chunkSize - remainingHeadBytes) assume the decoder returns text for exactly those byte counts. If headPart ends mid–multi-byte UTF-8 sequence, TextDecoder.decode(headPart, { stream: true }) will buffer 1–3 bytes internally and not emit them until the next decode call. That means headBufferedBytes/tailBufferedBytes can become inaccurate, and you can either (a) switch to tail mode too early (under-filling head) or (b) exceed the intended limits without setting wasTruncated correctly. This is observable whenever a chunk boundary/split hits a multi-byte sequence.

Also appears at src/app/v1/_lib/proxy/response-handler.ts:1165 and :1168.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/response-handler.ts
Line: 1164:1168

Comment:
**UTF-8 byte accounting wrong**

In the head/tail split path, `pushChunk(headText, remainingHeadBytes)` and `pushChunk(tailText, chunkSize - remainingHeadBytes)` assume the decoder returns text for exactly those byte counts. If `headPart` ends mid–multi-byte UTF-8 sequence, `TextDecoder.decode(headPart, { stream: true })` will buffer 1–3 bytes internally and *not* emit them until the next decode call. That means `headBufferedBytes`/`tailBufferedBytes` can become inaccurate, and you can either (a) switch to tail mode too early (under-filling head) or (b) exceed the intended limits without setting `wasTruncated` correctly. This is observable whenever a chunk boundary/split hits a multi-byte sequence.

Also appears at src/app/v1/_lib/proxy/response-handler.ts:1165 and :1168.

How can I resolve this? If you propose a fix, please make it concise.

@tesgth032
Copy link
Contributor Author

已逐条核对并采纳 AI review 的可行动建议(仅限本 PR 改动范围):

  1. response-handler.ts:idle watchdog 仅在收到首块非空 body chunk 后启动,避免 firstByteTimeoutStreamingMs 未配置时对慢启动流的误杀(首字节兜底仍由 firstByteTimeoutStreamingMs 负责)。
  2. 已确认 chunksCollected 统计为 head+tail 总数,trackedPromise 使用 constshutdown() 并行触发关闭以便 pendingCleanups 能被 best-effort 等待。
  3. PR 已 rebase 到最新 dev,避免回滚/删除上游新增测试文件。

当前 Actions checks 全绿,reviewDecision=APPROVED,可合并。

@ding113 ding113 merged commit 81e3ad7 into ding113:dev Feb 14, 2026
8 checks passed
@github-project-automation github-project-automation bot moved this from Backlog to Done in Claude Code Hub Roadmap Feb 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core bug Something isn't working size/S Small PR (< 200 lines)

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants