Conversation
📝 WalkthroughWalkthrough本次变更:在代理转发与响应处理链增加“假 200”/上游错误文本嗅探与状态码推断,将推断元数据贯穿 provider-chain 与统计,后端在启用冗长模式时返回脱敏上游详情;前端添加相应 i18n 文案与 UI(推断徽章、原因展示);并补充大量单元测试覆盖这些路径。 Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @ding113, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the system's ability to identify and report errors that are masked by a successful HTTP 200 status code from upstream providers. It introduces sophisticated content-based detection for various 'fake 200' scenarios, allowing the system to infer and utilize more accurate error status codes for internal logic like retries and circuit breaking. The user interface now provides clearer explanations for these errors, and a new configuration option enables more verbose error details in API responses for debugging. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request significantly improves the detection and logging of "fake 200" errors, enhancing observability and troubleshooting with well-structured changes for error detection, handling, and presentation, including status code inference and extensive test coverage. However, a critical security issue was identified regarding the incomplete redaction of sensitive information in error messages returned to clients, which could lead to the exposure of PII or credentials. Additionally, there is a suggestion to improve code consistency and maintainability regarding data sanitization.
src/app/v1/_lib/proxy/errors.ts
Outdated
| const safe = clipped | ||
| .replace(/Bearer\s+[A-Za-z0-9._-]+/gi, "Bearer [REDACTED]") | ||
| .replace(/\b(?:sk|rk|pk)-[A-Za-z0-9_-]{16,}\b/giu, "[REDACTED_KEY]") | ||
| .replace(/\bAIza[0-9A-Za-z_-]{16,}\b/g, "[REDACTED_KEY]"); |
There was a problem hiding this comment.
The getClientSafeMessage function uses an incomplete redaction logic for sensitive information, missing patterns like JWTs, emails, and generic secrets. This could lead to the exposure of PII or credentials to the client. For consistent and more secure sanitization, it is recommended to use the comprehensive sanitizeErrorTextForDetail function from @/lib/utils/upstream-error-detection instead of custom regex replacements. This will ensure all sensitive patterns are properly redacted.
const safe = sanitizeErrorTextForDetail(clipped);There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@src/app/`[locale]/dashboard/logs/_components/error-details-dialog.test.tsx:
- Around line 256-264: The test is missing an assertion for the "Detected
reason" line; in the existing fake-200 test inside error-details-dialog.test.tsx
(the test that already asserts the forwarded notice), add an assertion that the
rendered output contains the fake200DetectedReason string with the specific
fake200Reasons[...] value used in that test (use the same reason key the test
supplies), e.g., assert that fake200DetectedReason.replace("{reason}",
fake200Reasons.<usedKey>) appears in the DOM alongside the forwarded notice;
update the test for the fake-200 case to include this extra expectation so the
new display logic is covered.
In
`@src/app/`[locale]/dashboard/logs/_components/error-details-dialog/components/LogicTraceTab.tsx:
- Around line 466-474: The subtitle currently hardcodes the "HTTP …" prefix;
update LogicTraceTab.tsx to use i18n by creating a translation key (e.g.
logicTrace.httpStatus) and call t(...) to format the status string instead of
literal `HTTP ${item.statusCode}`; ensure you still conditionally append the
inferred suffix via t("statusCodeInferredSuffix") when item.statusCodeInferred
is true and preserve the existing branches that fall back to item.name or
tChain(`reasons.${item.reason}`) so only the HTTP prefix is replaced with
t("logicTrace.httpStatus", { code: item.statusCode }) (or equivalent usage of t
for interpolation).
In `@src/app/`[locale]/dashboard/logs/_components/provider-chain-popover.test.tsx:
- Around line 89-100: Replace the single-quoted string for
jsonMessageKeywordMatch inside the fake200Reasons object with a double-quoted
string and escape the inner double quotes (e.g., change 'JSON message contains
"error"' to "JSON message contains \"error\""), then run Biome formatting so the
file (including fake200DetectedReason, statusCodeInferredBadge,
statusCodeInferredTooltip, statusCodeInferredSuffix and fake200Reasons) conforms
to the project's double-quote, trailing-commas, 2-space indent, and 100-char
width rules.
In `@src/app/v1/_lib/proxy/error-handler.ts`:
- Around line 252-286: The verbose-details block guarded by
shouldAttachVerboseDetails can throw if getCachedSystemSettings() or subsequent
property access fails; wrap the entire block (the await
getCachedSystemSettings() call and the branching that sets details for
ProxyError and isEmptyResponseError) in a try-catch so failures silently degrade
(leave details as undefined) and do not throw; keep the existing branches for
ProxyError and isEmptyResponseError (references: shouldAttachVerboseDetails,
getCachedSystemSettings, ProxyError, isEmptyResponseError,
sanitizeErrorTextForDetail, details, upstreamRequestId) and optionally log the
caught error as a non-fatal debug message rather than letting it propagate.
In `@src/app/v1/_lib/proxy/response-handler.ts`:
- Around line 133-140: The call to inferUpstreamErrorStatusCodeFromText on line
133 is failing because that function isn’t imported; add an import for
inferUpstreamErrorStatusCodeFromText to the top import block (alongside the
other imports) using the exact exported name so the file compiles and the
effectiveStatusCode/statusCodeInferenceMatcherId logic can use the function.
🧹 Nitpick comments (4)
tests/unit/proxy/error-handler-verbose-provider-error-details.test.ts (1)
36-46:createSession使用any类型构造最小化 session 对象。考虑到这是测试辅助函数,使用
any简化了构造过程且不影响生产代码,可以接受。但与proxy-forwarder-fake-200-html.test.ts中使用ProxySession.prototype的方式相比,这里的方式更脆弱——如果ProxyErrorHandler.handle将来访问更多 session 属性,这里不会有编译时提示。src/lib/utils/provider-chain-formatter.ts (1)
810-848:vendor_type_all_timeout分支未使用formatTimelineStatusCode,与其他分支不一致。第 816 行和第 837 行直接调用
t("timeline.statusCode", ...)而非formatTimelineStatusCode(item, ...)。虽然当前vendor_type_all_timeout的statusCodeInferred预期始终为false,但为保持一致性和未来扩展性,建议统一使用新的辅助函数。建议的修改
if (item.errorDetails?.provider) { const p = item.errorDetails.provider; timeline += `${t("timeline.provider", { provider: p.name })}\n`; - timeline += `${t("timeline.statusCode", { code: p.statusCode })}\n`; + timeline += `${formatTimelineStatusCode(item, p.statusCode, t)}\n`; timeline += `${t("timeline.error", { error: p.statusText })}\n`;} else { timeline += `${t("timeline.provider", { provider: item.name })}\n`; if (item.statusCode) { - timeline += `${t("timeline.statusCode", { code: item.statusCode })}\n`; + timeline += `${formatTimelineStatusCode(item, item.statusCode, t)}\n`; }src/app/[locale]/dashboard/logs/_components/provider-chain-popover.tsx (1)
63-78:getFake200ReasonKey在多个组件中重复定义,建议提取为共享工具函数。
provider-chain-popover.tsx和error-details-dialog/components/SummaryTab.tsx中存在相同实现的getFake200ReasonKey函数。虽然返回的 i18n 键前缀略有不同(logs.details.fake200Reasons.对fake200Reasons.),但核心逻辑完全相同。将其提取到共享模块(如@/lib/utils/fake-200-helpers.ts)中,可避免维护重复代码。后续新增 FAKE_200 类型时只需修改一处映射。src/app/v1/_lib/proxy/forwarder.ts (1)
780-795:shouldInspectBody在isJson && !hasValidContentLength时的覆盖存在隐含依赖当
isJson=true且hasValidContentLength=false时:
shouldInspectJson= false(因为hasValidContentLength为 false)- 但
shouldInspectBody= true(因为!hasValidContentLength)此时仍会读取 body 并在后续 lines 797-830 对 JSON 内容做
detectUpstreamErrorFromSseOrJsonText检测——这是正确的行为,但逻辑上shouldInspectJson的命名会让人误以为"JSON 不会被检查"。当前命名shouldInspectJson实际含义更接近"是否因 JSON 类型 主动 触发检查",而!hasValidContentLength是兜底路径。建议在注释中补充说明,或将变量名改为更明确的表述,以减少维护时的误读。
src/app/[locale]/dashboard/logs/_components/error-details-dialog.test.tsx
Show resolved
Hide resolved
src/app/[locale]/dashboard/logs/_components/error-details-dialog/components/LogicTraceTab.tsx
Show resolved
Hide resolved
| fake200DetectedReason: "Detected reason: {reason}", | ||
| statusCodeInferredBadge: "Inferred", | ||
| statusCodeInferredTooltip: "This status code is inferred from response body content.", | ||
| statusCodeInferredSuffix: "(inferred)", | ||
| fake200Reasons: { | ||
| emptyBody: "Empty response body", | ||
| htmlBody: "HTML document returned", | ||
| jsonErrorNonEmpty: "JSON has non-empty error field", | ||
| jsonErrorMessageNonEmpty: "JSON has non-empty error.message", | ||
| jsonMessageKeywordMatch: 'JSON message contains "error"', | ||
| unknown: "Response body indicates an error", | ||
| }, |
There was a problem hiding this comment.
请改为双引号以符合项目格式规范。
jsonMessageKeywordMatch 使用单引号会违背 Biome 的双引号规则,建议改为双引号并转义内部引号。
Proposed fix
- jsonMessageKeywordMatch: 'JSON message contains "error"',
+ jsonMessageKeywordMatch: "JSON message contains \"error\"",As per coding guidelines: Format code with Biome using: double quotes, trailing commas, 2-space indent, 100 character line width
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fake200DetectedReason: "Detected reason: {reason}", | |
| statusCodeInferredBadge: "Inferred", | |
| statusCodeInferredTooltip: "This status code is inferred from response body content.", | |
| statusCodeInferredSuffix: "(inferred)", | |
| fake200Reasons: { | |
| emptyBody: "Empty response body", | |
| htmlBody: "HTML document returned", | |
| jsonErrorNonEmpty: "JSON has non-empty error field", | |
| jsonErrorMessageNonEmpty: "JSON has non-empty error.message", | |
| jsonMessageKeywordMatch: 'JSON message contains "error"', | |
| unknown: "Response body indicates an error", | |
| }, | |
| fake200DetectedReason: "Detected reason: {reason}", | |
| statusCodeInferredBadge: "Inferred", | |
| statusCodeInferredTooltip: "This status code is inferred from response body content.", | |
| statusCodeInferredSuffix: "(inferred)", | |
| fake200Reasons: { | |
| emptyBody: "Empty response body", | |
| htmlBody: "HTML document returned", | |
| jsonErrorNonEmpty: "JSON has non-empty error field", | |
| jsonErrorMessageNonEmpty: "JSON has non-empty error.message", | |
| jsonMessageKeywordMatch: "JSON message contains \"error\"", | |
| unknown: "Response body indicates an error", | |
| }, |
🤖 Prompt for AI Agents
In `@src/app/`[locale]/dashboard/logs/_components/provider-chain-popover.test.tsx
around lines 89 - 100, Replace the single-quoted string for
jsonMessageKeywordMatch inside the fake200Reasons object with a double-quoted
string and escape the inner double quotes (e.g., change 'JSON message contains
"error"' to "JSON message contains \"error\""), then run Biome formatting so the
file (including fake200DetectedReason, statusCodeInferredBadge,
statusCodeInferredTooltip, statusCodeInferredSuffix and fake200Reasons) conforms
to the project's double-quote, trailing-commas, 2-space indent, and 100-char
width rules.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/app/v1/_lib/proxy/response-handler.ts`:
- Around line 255-272: The 404 exemption is inconsistent: update the Gemini
passthrough and non-stream failure branches (the places that currently check if
(statusCode >= 400)) to match the Deferred path's logic by excluding 404 from
circuit-breaker recording; specifically, change those conditions to skip calling
recordFailure (or invoking the circuit-breaker import) when statusCode === 404
(or use the same effectiveStatusCode/chainReason logic), and add/update a
comment mirroring the Deferred path note that 404 denotes RESOURCE_NOT_FOUND and
should not be counted as provider failure; ensure the same recordFailure usage
(imported "@/lib/circuit-breaker" and recordFailure(meta.providerId, new
Error(detected.code))) is applied after the condition change.
🧹 Nitpick comments (1)
src/app/v1/_lib/proxy/response-handler.ts (1)
305-334: 非 200 流式路径的 404 处理逻辑一致,LGTM。正确复用了与假 200 路径相同的 404 豁免策略,且未传递
statusCodeInferred(因为此路径中状态码是实际 HTTP 状态,非推断值)。小建议:第 255 行和第 305 行的
chainReason推导逻辑完全相同,可考虑提取为辅助函数以减少重复。♻️ 可选重构:提取 chainReason 推导
+function deriveChainReason(effectiveStatusCode: number): "resource_not_found" | "retry_failed" { + return effectiveStatusCode === 404 ? "resource_not_found" : "retry_failed"; +} + // Line 255: -const chainReason = effectiveStatusCode === 404 ? "resource_not_found" : "retry_failed"; +const chainReason = deriveChainReason(effectiveStatusCode); // Line 305: -const chainReason = effectiveStatusCode === 404 ? "resource_not_found" : "retry_failed"; +const chainReason = deriveChainReason(effectiveStatusCode);
| { | ||
| statusCode: 404, | ||
| matcherId: "not_found", | ||
| re: /(?:\bHTTP\/\d(?:\.\d)?\s+404\b|\bnot\s+found\b|\b(?:model|deployment|endpoint|resource)\s+not\s+found\b|\bunknown\s+model\b|\bdoes\s+not\s+exist\b|\bNOT_FOUND\b|\bResourceNotFoundException\b|未找到|不存在|模型不存在)/iu, |
There was a problem hiding this comment.
Broad not_found regex may over-match
The bare \bnot\s+found\b alternative in this regex is quite broad. Since this matcher runs on the full response body text (up to 64 KiB), it could match on error messages like "Solution not found" or "Key not found in cache" that are not HTTP 404 semantics, causing a fake-200 to be incorrectly inferred as 404 instead of the default 502.
While this only runs after detectUpstreamErrorFromSseOrJsonText has confirmed isError=true, a mis-inferred 404 has meaningful consequences: it changes the chain reason to resource_not_found and skips circuit breaker recording (since 404 is treated as "resource doesn't exist, not a provider fault").
Consider tightening the bare match to require more context, e.g. requiring it to appear near an HTTP status reference, or removing the bare \bnot\s+found\b and relying on the more specific alternatives (model not found, endpoint not found, NOT_FOUND, ResourceNotFoundException, etc.).
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/utils/upstream-error-detection.ts
Line: 115:118
Comment:
**Broad `not_found` regex may over-match**
The bare `\bnot\s+found\b` alternative in this regex is quite broad. Since this matcher runs on the full response body text (up to 64 KiB), it could match on error messages like `"Solution not found"` or `"Key not found in cache"` that are not HTTP 404 semantics, causing a fake-200 to be incorrectly inferred as 404 instead of the default 502.
While this only runs after `detectUpstreamErrorFromSseOrJsonText` has confirmed `isError=true`, a mis-inferred 404 has meaningful consequences: it changes the chain reason to `resource_not_found` and **skips** circuit breaker recording (since 404 is treated as "resource doesn't exist, not a provider fault").
Consider tightening the bare match to require more context, e.g. requiring it to appear near an HTTP status reference, or removing the bare `\bnot\s+found\b` and relying on the more specific alternatives (`model not found`, `endpoint not found`, `NOT_FOUND`, `ResourceNotFoundException`, etc.).
How can I resolve this? If you propose a fix, please make it concise.| statusCode: 409, | ||
| matcherId: "conflict", | ||
| re: /(?:\bHTTP\/\d(?:\.\d)?\s+409\b|\bconflict\b|\bidempotency(?:-key)?\b|\bABORTED\b|冲突|幂等)/iu, |
There was a problem hiding this comment.
Broad conflict regex may over-match
The \bconflict\b alternative is very broad and could match on error messages that happen to contain the word "conflict" in a non-409 context (e.g., "merge conflict detected" in a CI/CD error page, or "version conflict" in a cache layer error). Since 409 Conflict is rarely returned by LLM API providers, the risk is low, but worth noting for future maintenance.
Consider requiring more specific context, e.g. \bHTTP\/\d(?:\.\d)?\s+409\b|\bidempotency(?:-key)?\b|\bABORTED\b without the bare \bconflict\b.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/utils/upstream-error-detection.ts
Line: 131:133
Comment:
**Broad `conflict` regex may over-match**
The `\bconflict\b` alternative is very broad and could match on error messages that happen to contain the word "conflict" in a non-409 context (e.g., `"merge conflict detected"` in a CI/CD error page, or `"version conflict"` in a cache layer error). Since 409 Conflict is rarely returned by LLM API providers, the risk is low, but worth noting for future maintenance.
Consider requiring more specific context, e.g. `\bHTTP\/\d(?:\.\d)?\s+409\b|\bidempotency(?:-key)?\b|\bABORTED\b` without the bare `\bconflict\b`.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| let statusCodeInferenceMatcherId: string | undefined; | ||
| if (detected.isError) { | ||
| effectiveStatusCode = 502; | ||
| const inferred = inferUpstreamErrorStatusCodeFromText(allContent); |
There was a problem hiding this comment.
[Critical] [LOGIC-BUG] Missing import for inferUpstreamErrorStatusCodeFromText
src/app/v1/_lib/proxy/response-handler.ts:133
Problematic code:
const inferred = inferUpstreamErrorStatusCodeFromText(allContent);Why this is a problem: inferUpstreamErrorStatusCodeFromText is not imported in this module, so this will fail compilation and block the build.
Suggested fix:
import {
detectUpstreamErrorFromSseOrJsonText,
inferUpstreamErrorStatusCodeFromText,
} from "@/lib/utils/upstream-error-detection";Confidence: 95/100
src/app/v1/_lib/proxy/forwarder.ts
Outdated
| if (truncated) { | ||
| try { | ||
| await reader.cancel(); | ||
| } catch { |
There was a problem hiding this comment.
[High] [ERROR-SILENT] readResponseTextUpTo swallows cleanup errors
src/app/v1/_lib/proxy/forwarder.ts:151 and src/app/v1/_lib/proxy/forwarder.ts:158
Problematic code:
} catch {
// ignore
}Why this is a problem: If reader.cancel() / reader.releaseLock() fails, the error is silently swallowed, which hides stream/reader issues and makes production failures harder to diagnose.
Suggested fix (log at debug-level and keep behavior the same):
} catch (cancelError) {
logger.debug("ProxyForwarder: Failed to cancel non-stream inspection reader", {
cancelError,
});
}
// ...
} catch (releaseError) {
logger.debug("ProxyForwarder: Failed to release non-stream inspection reader lock", {
releaseError,
});
}Confidence: 85/100
Code Review SummaryThis PR implements comprehensive "fake 200" error detection and logging improvements for the proxy system. The changes significantly enhance the system's ability to detect when upstream providers return HTTP 200 with error content (HTML error pages, JSON error objects, or empty bodies). PR Size: XL
Critical Issues Found
Fix required in // Add these imports at the top of the file
import { getCachedSystemSettings } from "@/lib/config/system-settings-cache";
import { sanitizeErrorTextForDetail } from "@/lib/utils/upstream-error-detection";Impact: Without these imports, the error handler will throw runtime errors when processing FAKE_200 errors with Review Coverage
Notable Positive Aspects
RecommendationRequest changes - Please add the missing imports in Automated review by Claude AI |
There was a problem hiding this comment.
Code Review Summary
This PR makes broad changes across core proxy error classification (fake-200 detection + inferred status codes), dashboard rendering/i18n strings, and adds a large amount of new tests. Two high-confidence issues were found: a missing import that will block compilation, and new silent catch blocks that swallow cleanup errors.
PR Size: XL
- Lines changed: 1996
- Files changed: 29
- Split suggestion: Separate into (1) proxy fake-200 detection/inference changes (+ unit tests), (2) dashboard UI + i18n message updates, (3) provider-chain formatter changes (+ formatter tests).
Issues Found
| Category | Critical | High | Medium | Low |
|---|---|---|---|---|
| Logic/Bugs | 1 | 0 | 0 | 0 |
| Security | 0 | 0 | 0 | 0 |
| Error Handling | 0 | 1 | 0 | 0 |
| Types | 0 | 0 | 0 | 0 |
| Comments/Docs | 0 | 0 | 0 | 0 |
| Tests | 0 | 0 | 0 | 0 |
| Simplification | 0 | 0 | 0 | 0 |
Critical Issues (Must Fix)
src/app/v1/_lib/proxy/response-handler.ts:133— Missing import forinferUpstreamErrorStatusCodeFromText(Confidence: 95/100)
High Priority Issues (Should Fix)
src/app/v1/_lib/proxy/forwarder.ts:151—readResponseTextUpToswallows cleanup errors (reader.cancel()/reader.releaseLock()) (Confidence: 85/100)
Review Coverage
- Logic and correctness
- Security (OWASP Top 10)
- Error handling
- Type safety
- Documentation accuracy
- Test coverage
- Code clarity
Automated review by Codex AI
There was a problem hiding this comment.
- Identified open PR
#790and applied the labelsize/XL(1996 lines changed across 29 files). - Posted 2 inline review comments on diff lines:
src/app/v1/_lib/proxy/response-handler.ts:133— Critical missing import forinferUpstreamErrorStatusCodeFromText(build-blocking).src/app/v1/_lib/proxy/forwarder.ts:151— High silentcatch { /* ignore */ }swallowingreader.cancel()/reader.releaseLock()cleanup errors.
- Submitted the required “Code Review Summary” as a PR review comment via
gh pr review.
* fix(proxy): 识别 200+HTML 假200并触发故障转移 * fix(utils): 收紧 HTML 文档识别避免误判 * fix(proxy): 非流式假200补齐强信号 JSON error 检测 * fix(utils): 假200检测兼容 BOM * perf(proxy): 降低非流式嗅探读取上限 * fix(proxy): 客户端隐藏 FAKE_200_* 内部码 * fix(logs): 补齐 endpoint_pool_exhausted/404 错因展示 - endpoint_pool_exhausted 写入 attemptNumber,避免被 initial_selection/session_reuse 去重吞掉\n- 决策链/技术时间线补齐 resource_not_found 的失败态与说明\n- 更新 provider-chain i18n 文案并新增单测覆盖 * fix(proxy): 非流式 JSON 假200检测覆盖 Content-Length - 对 application/json 且 Content-Length<=32KiB 的 2xx 响应也做强信号嗅探\n- 补齐 200+JSON error(带 Content-Length)触发故障转移的回归测试 * chore: format code (fix-issue-749-fake-200-html-detection-005fad3) * fix(i18n): 修正 ru 端点池耗尽文案 - 修正俄语中 endpoint 的复数属格拼写(конечных точек)\n- 不影响 key,仅更新展示文案 * test(formatter): 补齐 resource_not_found 组合场景覆盖 - 覆盖 resource_not_found + retry_success 多供应商链路\n- 覆盖缺少 errorDetails.provider 的降级渲染路径 * fix(proxy): FAKE_200 客户端提示附带脱敏片段 * fix: 改进 FAKE_200 错误原因提示 * fix(proxy): verboseProviderError 回传假200原文 - fake-200/空响应:verboseProviderError 开启时在 error.details 返回详细报告与上游原文(不落库)\n- forwarder: 将检测到的原文片段挂到 ProxyError.upstreamError.rawBody\n- tests: 覆盖 verbose details 与 rawBody 透传 * fix(proxy): 强化 Content-Length 校验与假200片段防泄露 - forwarder: 将非法 Content-Length 视为无效,避免漏检 HTML/空响应\n- errors: FAKE_200 客户端 detail 二次截断 + 轻量脱敏(防御性)\n- tests: 覆盖非法 Content-Length 漏检回归 * docs(proxy): 说明非流式假200检测上限 * docs(settings): 补充 verboseProviderError 安全提示 * fix(proxy): verboseProviderError rawBody 基础脱敏 * chore: format code (fix-issue-749-fake-200-html-detection-b56b790) * docs(settings): 说明 verboseProviderError 基础脱敏 * fix(proxy/logs): 假200 推断状态码并显著标记 * fix(i18n): 回退 verboseProviderErrorDesc 原始文案 * fix(stream): 404 资源不存在不计入熔断 --------- Co-authored-by: tesgth032 <tesgth032@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Ding <44717411+ding113@users.noreply.github.com>
The function inferUpstreamErrorStatusCodeFromText was used in response-handler.ts but was not imported, causing a TypeScript compilation error during build. Fixed: - Added inferUpstreamErrorStatusCodeFromText to imports from @/lib/utils/upstream-error-detection CI Run: https://github.com/ding113/claude-code-hub/actions/runs/22033028838
…ing sanitization Extract duplicated getFake200ReasonKey() from SummaryTab and ProviderChainPopover into a shared fake200-reason.ts utility, eliminating the risk of silent drift when new FAKE_200_* codes are added. Replace the 3-pattern manual sanitization in getClientSafeMessage() with the existing sanitizeErrorTextForDetail() (6 patterns), closing a gap where JWT tokens, emails, and password/config paths could leak to clients via the FAKE_200 error detail path. Add unit tests verifying JWT, email, and password sanitization.
83bf797 to
3db096d
Compare
🧪 测试结果
总体结果: ✅ 所有测试通过 |
| { | ||
| statusCode: 500, | ||
| matcherId: "internal_server_error", | ||
| re: /(?:\bHTTP\/\d(?:\.\d)?\s+500\b|\binternal\s+server\s+error\b|\bInternalServerException\b|\bINTERNAL\b|内部错误|服务器错误)/iu, | ||
| }, |
There was a problem hiding this comment.
Broad \bINTERNAL\b with case-insensitive flag
With the /iu flags, \bINTERNAL\b matches the word "internal" in any case (e.g., "internal configuration error", "internal model inference failed"). While this matcher is near the bottom of the priority list (position 14 of 15) and only runs on already-confirmed errors, a match could cause a mis-categorization as 500 when the actual error might be better mapped to a different status code (e.g., a 400-level error that happens to contain the word "internal").
Consider making this case-sensitive (matching only gRPC-style INTERNAL) or requiring more context, similar to how other matchers require surrounding keywords:
| { | |
| statusCode: 500, | |
| matcherId: "internal_server_error", | |
| re: /(?:\bHTTP\/\d(?:\.\d)?\s+500\b|\binternal\s+server\s+error\b|\bInternalServerException\b|\bINTERNAL\b|内部错误|服务器错误)/iu, | |
| }, | |
| { | |
| statusCode: 500, | |
| matcherId: "internal_server_error", | |
| re: /(?:\bHTTP\/\d(?:\.\d)?\s+500\b|\binternal\s+server\s+error\b|\bInternalServerException\b|\bINTERNAL\b(?<![a-z])|内部错误|服务器错误)/iu, | |
| }, |
Note: the suggestion above uses a negative lookbehind (?<![a-z]) to match INTERNAL only when not preceded by a lowercase letter, which would require the all-caps form. Alternatively, you could replace \bINTERNAL\b with a standalone case-sensitive check or require it to appear as a standalone gRPC status (e.g., "status":\s*"INTERNAL"). This is low-priority given its position in the matcher list.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/utils/upstream-error-detection.ts
Line: 160:164
Comment:
**Broad `\bINTERNAL\b` with case-insensitive flag**
With the `/iu` flags, `\bINTERNAL\b` matches the word "internal" in any case (e.g., `"internal configuration error"`, `"internal model inference failed"`). While this matcher is near the bottom of the priority list (position 14 of 15) and only runs on already-confirmed errors, a match could cause a mis-categorization as 500 when the actual error might be better mapped to a different status code (e.g., a 400-level error that happens to contain the word "internal").
Consider making this case-sensitive (matching only gRPC-style `INTERNAL`) or requiring more context, similar to how other matchers require surrounding keywords:
```suggestion
{
statusCode: 500,
matcherId: "internal_server_error",
re: /(?:\bHTTP\/\d(?:\.\d)?\s+500\b|\binternal\s+server\s+error\b|\bInternalServerException\b|\bINTERNAL\b(?<![a-z])|内部错误|服务器错误)/iu,
},
```
Note: the suggestion above uses a negative lookbehind `(?<![a-z])` to match `INTERNAL` only when not preceded by a lowercase letter, which would require the all-caps form. Alternatively, you could replace `\bINTERNAL\b` with a standalone case-sensitive check or require it to appear as a standalone gRPC status (e.g., `"status":\s*"INTERNAL"`). This is low-priority given its position in the matcher list.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/app/v1/_lib/proxy/error-handler.ts`:
- Around line 257-260: 当前对 error.upstreamError.rawBody 的处理会对非空字符串先调用
sanitizeErrorTextForDetail 但不限制长度,导致超长内容(如大 HTML)可被放入返回的 details。修改 rawBody
的赋值流程:若为字符串,先调用 sanitizeErrorTextForDetail(rawBody) 再对结果执行截断(复用或与
getClientSafeMessage() 相同的最大长度常量/策略),将截断后的字符串赋回用于 details;保留对非字符串 rawBody
的原有行为。使用标识符:rawBody、sanitizeErrorTextForDetail、getClientSafeMessage、details。
In `@src/lib/utils/upstream-error-detection.ts`:
- Around line 94-170: The not_found matcher in ERROR_STATUS_MATCHERS is too
broad (it uses \bnot\s+found\b) and can false-positive generic messages like
"Configuration key not found"; update the entry with matcherId "not_found"
inside ERROR_STATUS_MATCHERS to remove the generic \bnot\s+found\b and tighten
the re to only match contextual patterns (e.g.,
\b(model|deployment|endpoint|resource|service|user)\s+not\s+found\b and explicit
HTTP\/\d+\s+404 patterns) so only real resource-missing responses trigger the
404 classification.
🧹 Nitpick comments (4)
src/lib/utils/provider-chain-formatter.ts (1)
813-837:vendor_type_all_timeout分支未使用formatTimelineStatusCode,与其他分支不一致。
retry_failed、resource_not_found、client_error_non_retryable均已统一使用formatTimelineStatusCode,但vendor_type_all_timeout的两处(第816行和第837行)仍直接调用t("timeline.statusCode", ...)。虽然当前 vendor_type_all_timeout 不太可能出现statusCodeInferred=true的场景,但为了一致性和未来扩展性,建议统一使用 helper。建议修改
if (item.errorDetails?.provider) { const p = item.errorDetails.provider; timeline += `${t("timeline.provider", { provider: p.name })}\n`; - timeline += `${t("timeline.statusCode", { code: p.statusCode })}\n`; + timeline += `${formatTimelineStatusCode(item, p.statusCode, t)}\n`; timeline += `${t("timeline.error", { error: p.statusText })}\n`;} else { timeline += `${t("timeline.provider", { provider: item.name })}\n`; if (item.statusCode) { - timeline += `${t("timeline.statusCode", { code: item.statusCode })}\n`; + timeline += `${formatTimelineStatusCode(item, item.statusCode, t)}\n`; }src/app/v1/_lib/proxy/errors.ts (1)
483-528:getClientSafeMessage()中 FAKE_200 分支的防御性处理值得肯定,但有一个小建议。整体逻辑合理:reason 映射 → inferred note → 防御性二次脱敏/截断。不过
maxChars = 200是一个有业务含义的阈值,建议提取为模块级常量,方便后续统一调整。建议提取常量
+/** getClientSafeMessage() 中"上游 detail 片段"的最大字符数 */ +const CLIENT_SAFE_DETAIL_MAX_CHARS = 200; + // 在 getClientSafeMessage 内部 - const maxChars = 200; + const maxChars = CLIENT_SAFE_DETAIL_MAX_CHARS;src/lib/utils/upstream-error-detection.ts (1)
160-164:internal_server_error的\bINTERNAL\b匹配过宽。
\bINTERNAL\b会匹配任何包含 "internal" 单词的文本(因为/iu标志),包括"INTERNAL_ERROR"以及"An internal process has..."等非 500 的上下文。由于该规则排在列表靠后位置(优先级低),实际误判概率不高,但仍建议收窄为更具上下文的模式,如\binternal\s+(?:server\s+)?error\b或\bINTERNAL_ERROR\b。src/app/v1/_lib/proxy/forwarder.ts (1)
832-839: 缺失 Content-Length 时,inspectedText保证已被赋值,但依赖隐式推导——建议添加断言注释。当进入
!contentLength || !hasValidContentLength分支时,shouldInspectBody一定为true(因为!hasValidContentLength是其条件之一),所以inspectedText一定已被赋值。但这个推导链不够显式,后续维护者可能不清楚为何inspectedText ?? ""的 fallback 不会被触发。建议添加注释
if (!contentLength || !hasValidContentLength) { - const responseText = inspectedText ?? ""; + // shouldInspectBody 包含 !hasValidContentLength 条件,所以此处 inspectedText 一定已被赋值。 + // ?? "" 仅作防御性兜底。 + const responseText = inspectedText ?? "";
- Add i18n for HTTP status prefix in LogicTraceTab (5 languages) - Wrap verbose details gathering in try-catch to prevent cascading failures - Truncate rawBody to 4096 chars before sanitization in error-handler - Tighten not_found regex to require contextual prefixes, preventing false 404 inference - Add debug logging to silent catch blocks in readResponseTextUpTo - Add test assertion for fake200DetectedReason display
🧪 测试结果
总体结果: ✅ 所有测试通过 |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/app/`[locale]/dashboard/logs/_components/error-details-dialog.test.tsx:
- Around line 259-264: The string value for the jsonMessageKeywordMatch entry
uses single quotes and should use double quotes according to the project's Biome
formatting rules; update the value for jsonMessageKeywordMatch in the same
object (the mapping with keys emptyBody, htmlBody, jsonErrorNonEmpty,
jsonErrorMessageNonEmpty, jsonMessageKeywordMatch, unknown) to use double quotes
and escape the internal quotes (e.g., JSON message contains \"error\"),
preserving the trailing comma and existing indentation.
🧹 Nitpick comments (3)
src/lib/utils/upstream-error-detection.ts (1)
130-134:conflict匹配器中\bconflict\b覆盖面偏宽
\bconflict\b会匹配任何包含 "conflict" 的错误文本(如"merge conflict in configuration"或"conflicting parameters"),可能导致非 409 场景被误推断为 409。虽然该函数仅在已判定isError=true后才调用,但误推断的状态码会影响熔断/故障转移决策。建议收紧为带上下文的模式,例如
\b(?:version|resource|state|write)\s+conflict\b或\bidempotency[-_ ]?conflict\b,与not_found的收紧策略保持一致。src/app/v1/_lib/proxy/error-handler.ts (1)
248-250:startsWith("FAKE_200_")与FAKE_200_CODES常量存在隐式耦合
shouldAttachVerboseDetails依赖error.message.startsWith("FAKE_200_")判断是否为假 200 错误。这与upstream-error-detection.ts中FAKE_200_CODES的值格式存在隐式耦合——如果后续有人修改了 code 前缀,此处会静默失效。建议从
upstream-error-detection.ts导出一个辅助判断函数(如isFake200Code(code: string): boolean)或导出FAKE_200_CODES常量,让两处逻辑通过同一来源保持一致。建议方案
在
upstream-error-detection.ts中新增:export function isFake200Code(code: string): boolean { return Object.values(FAKE_200_CODES).includes(code as (typeof FAKE_200_CODES)[keyof typeof FAKE_200_CODES]); }然后在
error-handler.ts中:- const shouldAttachVerboseDetails = - (error instanceof ProxyError && error.message.startsWith("FAKE_200_")) || - isEmptyResponseError(error); + const shouldAttachVerboseDetails = + (error instanceof ProxyError && isFake200Code(error.message)) || + isEmptyResponseError(error);src/app/v1/_lib/proxy/forwarder.ts (1)
780-784:shouldInspectBody条件:未覆盖非标准 Content-Type 的假 200 场景当 Content-Type 既非 HTML、也非 JSON,且有合法 Content-Length 时(例如
text/plain或text/xml),shouldInspectBody为false,不会进行假 200 检测。根据注释,这是有意为之(避免对大体积非 JSON 响应做无谓检查)。但如果上游以
text/plain返回 200 + 错误文本,此路径会漏检。建议在注释中补充说明这一设计取舍,便于后续维护者理解。
| emptyBody: "Empty response body", | ||
| htmlBody: "HTML document returned", | ||
| jsonErrorNonEmpty: "JSON has non-empty error field", | ||
| jsonErrorMessageNonEmpty: "JSON has non-empty error.message", | ||
| jsonMessageKeywordMatch: 'JSON message contains "error"', | ||
| unknown: "Response body indicates an error", |
There was a problem hiding this comment.
字符串引号需统一为双引号
Line 263 使用单引号,违反当前代码格式规范。建议改为双引号并转义内部引号。
修改建议
- jsonMessageKeywordMatch: 'JSON message contains "error"',
+ jsonMessageKeywordMatch: "JSON message contains \"error\"",As per coding guidelines: Format code with Biome using: double quotes, trailing commas, 2-space indent, 100 character line width.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| emptyBody: "Empty response body", | |
| htmlBody: "HTML document returned", | |
| jsonErrorNonEmpty: "JSON has non-empty error field", | |
| jsonErrorMessageNonEmpty: "JSON has non-empty error.message", | |
| jsonMessageKeywordMatch: 'JSON message contains "error"', | |
| unknown: "Response body indicates an error", | |
| emptyBody: "Empty response body", | |
| htmlBody: "HTML document returned", | |
| jsonErrorNonEmpty: "JSON has non-empty error field", | |
| jsonErrorMessageNonEmpty: "JSON has non-empty error.message", | |
| jsonMessageKeywordMatch: "JSON message contains \"error\"", | |
| unknown: "Response body indicates an error", |
🤖 Prompt for AI Agents
In `@src/app/`[locale]/dashboard/logs/_components/error-details-dialog.test.tsx
around lines 259 - 264, The string value for the jsonMessageKeywordMatch entry
uses single quotes and should use double quotes according to the project's Biome
formatting rules; update the value for jsonMessageKeywordMatch in the same
object (the mapping with keys emptyBody, htmlBody, jsonErrorNonEmpty,
jsonErrorMessageNonEmpty, jsonMessageKeywordMatch, unknown) to use double quotes
and escape the internal quotes (e.g., JSON message contains \"error\"),
preserving the trailing comma and existing indentation.
| if (!contentLength || !hasValidContentLength) { | ||
| const responseText = inspectedText ?? ""; | ||
|
|
||
| if (!responseText || responseText.trim() === "") { | ||
| throw new EmptyResponseError(currentProvider.id, currentProvider.name, "empty_body"); | ||
| } |
There was a problem hiding this comment.
Missing body read when inspectedText is undefined
When shouldInspectBody is false (e.g., valid Content-Length + non-HTML/non-JSON content type), inspectedText remains undefined. This branch then checks !responseText where responseText = inspectedText ?? "", which evaluates to empty string, causing an EmptyResponseError to be thrown even when the response has a valid non-zero Content-Length.
Example: Content-Type: text/plain, Content-Length: 100 would skip body inspection but then throw empty response error.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/forwarder.ts
Line: 834:839
Comment:
Missing body read when `inspectedText` is undefined
When `shouldInspectBody` is `false` (e.g., valid Content-Length + non-HTML/non-JSON content type), `inspectedText` remains `undefined`. This branch then checks `!responseText` where `responseText = inspectedText ?? ""`, which evaluates to empty string, causing an `EmptyResponseError` to be thrown even when the response has a valid non-zero Content-Length.
Example: `Content-Type: text/plain`, `Content-Length: 100` would skip body inspection but then throw empty response error.
How can I resolve this? If you propose a fix, please make it concise.| const shouldInspectJson = | ||
| isJson && | ||
| hasValidContentLength && | ||
| contentLengthBytes <= NON_STREAM_BODY_INSPECTION_MAX_BYTES; | ||
| const shouldInspectBody = isHtml || !hasValidContentLength || shouldInspectJson; |
There was a problem hiding this comment.
Body inspection logic may be redundant for valid Content-Length
When hasValidContentLength is true and contentLengthBytes > 0, the code still inspects the body if content type is HTML or if it's a small JSON. However, if a valid Content-Length header exists and is non-zero, the empty response check below (line 837) is already protected. Consider whether HTML/JSON inspection is needed when Content-Length is valid and non-zero, or clarify with comments.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/forwarder.ts
Line: 780:784
Comment:
Body inspection logic may be redundant for valid Content-Length
When `hasValidContentLength` is true and `contentLengthBytes > 0`, the code still inspects the body if content type is HTML or if it's a small JSON. However, if a valid Content-Length header exists and is non-zero, the empty response check below (line 837) is already protected. Consider whether HTML/JSON inspection is needed when Content-Length is valid and non-zero, or clarify with comments.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| type CacheTtlOption = CacheTtlPreference | null | undefined; | ||
|
|
||
| // 非流式响应体检查的上限(字节):避免上游在 2xx 场景返回超大内容导致内存占用失控。 | ||
| // 说明: | ||
| // - 该检查仅用于“空响应/假 200”启发式判定,不用于业务逻辑解析; | ||
| // - 超过上限时,仍认为“非空”,但会跳过 JSON 内容结构检查(避免截断导致误判)。 | ||
| const NON_STREAM_BODY_INSPECTION_MAX_BYTES = 32 * 1024; // 32 KiB | ||
|
|
||
| /** | ||
| * 读取响应体文本,但最多读取 `maxBytes` 字节(用于非流式 2xx 的“空响应/假 200”嗅探)。 | ||
| * | ||
| * 注意: | ||
| * - 该函数只用于启发式检测,不用于业务逻辑解析; | ||
| * - 超过上限时会 `cancel()` reader,避免继续占用资源; | ||
| * - 调用方应使用 `response.clone()`,避免消费掉原始响应体,影响后续透传/解析。 | ||
| */ | ||
| async function readResponseTextUpTo( | ||
| response: Response, | ||
| maxBytes: number | ||
| ): Promise<{ text: string; truncated: boolean }> { | ||
| const reader = response.body?.getReader(); | ||
| if (!reader) { | ||
| return { text: "", truncated: false }; | ||
| } | ||
|
|
||
| const decoder = new TextDecoder(); | ||
| const chunks: string[] = []; | ||
| let bytesRead = 0; | ||
| let truncated = false; | ||
|
|
||
| try { | ||
| while (true) { | ||
| const { done, value } = await reader.read(); | ||
| if (done) break; | ||
| if (!value || value.byteLength === 0) continue; | ||
|
|
||
| const remaining = maxBytes - bytesRead; | ||
| // 注意:remaining<=0 发生在“已经读到下一块 chunk”之后。 | ||
| // 对启发式嗅探而言,直接标记 truncated 并退出即可(等价于丢弃超出上限的后续字节), | ||
| // 避免对超出部分做无谓的解码开销。 | ||
| if (remaining <= 0) { | ||
| truncated = true; | ||
| break; | ||
| } | ||
|
|
||
| if (value.byteLength > remaining) { | ||
| chunks.push(decoder.decode(value.subarray(0, remaining), { stream: true })); | ||
| bytesRead += remaining; | ||
| truncated = true; | ||
| break; | ||
| } | ||
|
|
||
| chunks.push(decoder.decode(value, { stream: true })); | ||
| bytesRead += value.byteLength; | ||
| } | ||
|
|
||
| const flushed = decoder.decode(); | ||
| if (flushed) chunks.push(flushed); | ||
| } finally { | ||
| if (truncated) { | ||
| try { | ||
| await reader.cancel(); | ||
| } catch (cancelErr) { | ||
| logger.debug("readResponseTextUpTo: failed to cancel reader", { error: cancelErr }); | ||
| } | ||
| } | ||
|
|
||
| try { | ||
| reader.releaseLock(); | ||
| } catch (releaseErr) { | ||
| logger.debug("readResponseTextUpTo: failed to release reader lock", { error: releaseErr }); | ||
| } | ||
| } | ||
|
|
||
| return { text: chunks.join(""), truncated }; |
There was a problem hiding this comment.
Consider extracting readResponseTextUpTo to a shared utility
This 75-line function provides reusable stream reading functionality that could benefit other parts of the codebase. Consider moving it to a utility module (e.g., src/lib/utils/response-reader.ts) for better code organization and reusability.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/forwarder.ts
Line: 89:163
Comment:
Consider extracting `readResponseTextUpTo` to a shared utility
This 75-line function provides reusable stream reading functionality that could benefit other parts of the codebase. Consider moving it to a utility module (e.g., `src/lib/utils/response-reader.ts`) for better code organization and reusability.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
Additional Comments (2)
Prompt To Fix With AIThis is a comment left during a code review.
Path: src/lib/utils/upstream-error-detection.ts
Line: 245:245
Comment:
Bearer token regex may not cover all formats
`/Bearer\s+[A-Za-z0-9._-]+/gi` won't match tokens containing other valid base64url characters or longer character sets. Some providers use different token formats. Verify this covers your actual token formats, or document known limitations.
How can I resolve this? If you propose a fix, please make it concise.
Pattern only matches key-value pairs with Prompt To Fix With AIThis is a comment left during a code review.
Path: src/lib/utils/upstream-error-detection.ts
Line: 262:264
Comment:
Sensitive key-value regex requires colon or equals
Pattern only matches key-value pairs with `:` or `=`. Won't sanitize JSON format like `{"password":"value"}` or certain query params. Consider handling quoted JSON values separately.
How can I resolve this? If you propose a fix, please make it concise. |
Summary
This PR significantly improves fake 200 error detection and logging throughout the proxy system. Upstream providers sometimes return HTTP 200 with error content (HTML error pages, JSON error objects, or empty bodies), which previously bypassed circuit breakers and caused incorrect session binding.
Problem
When upstream providers return HTTP 200 with error content (Cloudflare/WAF challenge pages, authentication failures in JSON, empty responses), the proxy incorrectly recorded these as successes. This caused:
Related Issues:
Solution
Core Detection Logic:
upstream-error-detection.tsutility detects fake 200s via HTML document sniffing (DOCTYPE/html tags), JSON error field inspection, and empty body checksStatus Code Inference:
Security:
verboseProviderErrorsystem setting is enabledChanges
Core Changes
src/lib/utils/upstream-error-detection.ts- Adds fake 200 detection and status code inferencesrc/app/v1/_lib/proxy/forwarder.ts- Non-streaming fake 200 detection with Content-Type inspectionsrc/app/v1/_lib/proxy/response-handler.ts- Streaming SSE fake 200 detection during finalizationsrc/app/v1/_lib/proxy/error-handler.ts- Verbose error details with sanitizationsrc/app/v1/_lib/proxy/errors.ts- Extended ProxyError with statusCodeInferred fieldsUI Changes
src/app/[locale]/dashboard/logs/_components/provider-chain-popover.tsx- Displays inferred status badgesTests
tests/unit/proxy/proxy-forwarder-fake-200-html.test.ts- 536-line test suite for HTML detectionsrc/lib/utils/upstream-error-detection.test.ts- Status inference pattern testsBreaking Changes
None. This is a purely additive detection enhancement.
Testing
Automated Tests
Manual Testing
Description enhanced by Claude AI
Greptile Summary
This PR implements comprehensive fake-200 error detection throughout the proxy system, addressing cases where upstream providers return HTTP 200 with error content (HTML pages, JSON errors, or empty bodies). The implementation properly distinguishes between streaming and non-streaming responses, with deferred finalization for SSE and immediate detection for non-streaming requests.
Key improvements:
upstream-error-detection.tsutility with HTML document sniffing, JSON error field inspection, and semantic status code inference (14+ patterns covering 429, 401, 403, 402, 404, etc.)Test coverage:
Issues found:
The architecture is sound with proper separation of concerns, thorough error handling, and extensive test coverage. The one logic bug should be fixed before merge.
Confidence Score: 4/5
Important Files Changed
Flowchart
flowchart TD A[Upstream Response<br/>HTTP 200] --> B{Content-Type?} B -->|text/event-stream| C[SSE Stream] B -->|text/html or<br/>application/json| D[Non-Stream] B -->|Other| D C --> C1[Defer Finalization<br/>to ResponseHandler] C1 --> C2[Stream Ends] C2 --> C3[Parse SSE Events] C3 --> C4{Detect Error?} C4 -->|error field<br/>non-empty| E1[Fake 200 Detected] C4 -->|message keyword<br/>match| E1 C4 -->|Empty body| E1 C4 -->|No error| C5[Success] D --> D1{Content-Length?} D1 -->|0| E2[Empty Response Error] D1 -->|Valid & > 0| D2{Inspect Body?} D1 -->|Missing/Invalid| D3[Clone & Read Body] D2 -->|HTML Content-Type| D3 D2 -->|JSON < 32KB| D3 D2 -->|Other| D4[Skip Inspection] D3 --> D5[Read up to 32KB] D5 --> D6{Detect Error?} D6 -->|HTML Document<br/>DOCTYPE/html tag| E1 D6 -->|JSON error field<br/>non-empty| E1 D6 -->|Empty body| E2 D6 -->|No strong signal| D7[Parse JSON Content] D4 --> D7 D7 --> D8{Valid Output?} D8 -->|Yes| C5 D8 -->|No content| E2 E1 --> E3[Infer Status Code] E3 --> E4{Match Pattern?} E4 -->|rate_limit| F1[429] E4 -->|unauthorized| F2[401] E4 -->|forbidden| F3[403] E4 -->|payment_required| F4[402] E4 -->|not_found| F5[404] E4 -->|No match| F6[502 default] F1 & F2 & F3 & F4 & F5 & F6 --> G[Trigger Failover] G --> H{404?} H -->|Yes| I[Skip Circuit Breaker] H -->|No| J[Record Circuit Breaker] E2 --> G I & J --> K[Update Provider Chain] K --> L[Retry Next Provider] C5 --> M[Record Success] M --> N[Update Session Binding] N --> O[Return Response]Last reviewed commit: f0b99a4