Conversation
…ision chain audit - Add ENABLE_ENDPOINT_CIRCUIT_BREAKER env var (default: false) to gate endpoint-level circuit breaker - Gate isEndpointCircuitOpen, recordEndpointFailure, recordEndpointSuccess, triggerEndpointCircuitBreakerAlert behind env switch - Add initEndpointCircuitBreaker() startup cleanup: clear stale Redis keys when feature disabled - Gate endpoint filtering in endpoint-selector (getPreferredProviderEndpoints, getEndpointFilterStats) - Fix 524 vendor-type timeout missing from decision chain: add chain entry with reason=vendor_type_all_timeout in forwarder - Add vendor_type_all_timeout to ProviderChainItem reason union type (both backend session.ts and frontend message.ts) - Add timeline rendering for vendor_type_all_timeout in provider-chain-formatter - Replace hardcoded Chinese strings in provider-selector circuit_open details with i18n keys - Add i18n translations for vendor_type_all_timeout and filterDetails (5 languages: zh-CN, zh-TW, en, ja, ru) - Enhance LogicTraceTab to render filterDetails via i18n lookup with fallback - Add endpoint_pool_exhausted and vendor_type_all_timeout to provider-chain-popover isActualRequest/getItemStatus - Add comprehensive unit tests for all changes (endpoint-circuit-breaker, endpoint-selector, provider-chain-formatter)
📝 Walkthrough总体说明此变更引入了一个新的特性开关 变更清单
预计代码审核工作量🎯 4 (复杂) | ⏱️ ~65 分钟 可能相关的 PR
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @ding113, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the system's resilience and observability by introducing a configurable toggle for the endpoint-level circuit breaker, ensuring it is off by default for safer deployment. It also significantly improves the auditing of vendor-type timeouts, making critical failure reasons visible in the decision chain. Furthermore, the changes address internationalization issues and update the user interface to accurately display these new timeout and circuit breaker states. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
🧪 测试结果
总体结果: ✅ 所有测试通过 |
There was a problem hiding this comment.
Code Review
This pull request introduces several valuable improvements. Making the endpoint circuit breaker default to OFF via an environment variable is a safe and flexible approach for feature rollout. The addition of 524 timeout auditing to the decision chain significantly enhances observability and debugging capabilities. Furthermore, the internationalization fixes by replacing hardcoded strings with i18n keys are a great step towards better maintainability. The code is well-tested, covering the new logic extensively. I have a few suggestions to improve code consistency and maintainability.
| const detailsText = f.details | ||
| ? t(`filterDetails.${f.details}`) !== `filterDetails.${f.details}` | ||
| ? t(`filterDetails.${f.details}`) | ||
| : f.details | ||
| : f.reason; |
There was a problem hiding this comment.
For consistency and better readability, consider using t.has() to check for the existence of a translation key, similar to how it's used in LogicTraceTab.tsx. The current method t(...) !== ... works, but t.has() is more explicit about the intent.
| const detailsText = f.details | |
| ? t(`filterDetails.${f.details}`) !== `filterDetails.${f.details}` | |
| ? t(`filterDetails.${f.details}`) | |
| : f.details | |
| : f.reason; | |
| const detailsText = f.details | |
| ? t.has(`filterDetails.${f.details}`) | |
| ? t(`filterDetails.${f.details}`) | |
| : f.details | |
| : f.reason; |
| if (item.reason === "vendor_type_all_timeout") { | ||
| timeline += `${t("timeline.vendorTypeAllTimeout")}\n\n`; | ||
|
|
||
| if (item.errorDetails?.provider) { | ||
| const p = item.errorDetails.provider; | ||
| timeline += `${t("timeline.provider", { provider: p.name })}\n`; | ||
| timeline += `${t("timeline.statusCode", { code: p.statusCode })}\n`; | ||
| timeline += `${t("timeline.error", { error: p.statusText })}\n`; | ||
|
|
||
| if (i > 0 && item.timestamp && chain[i - 1]?.timestamp) { | ||
| const duration = item.timestamp - (chain[i - 1]?.timestamp || 0); | ||
| timeline += `${t("timeline.requestDuration", { duration })}\n`; | ||
| } | ||
|
|
||
| if (p.upstreamParsed) { | ||
| timeline += `\n${t("timeline.errorDetails")}:\n`; | ||
| timeline += JSON.stringify(p.upstreamParsed, null, 2); | ||
| } else if (p.upstreamBody) { | ||
| timeline += `\n${t("timeline.errorDetails")}:\n${p.upstreamBody}`; | ||
| } | ||
|
|
||
| if (item.errorDetails?.request) { | ||
| timeline += formatRequestDetails(item.errorDetails.request, t); | ||
| } | ||
| } else { | ||
| timeline += `${t("timeline.provider", { provider: item.name })}\n`; | ||
| if (item.statusCode) { | ||
| timeline += `${t("timeline.statusCode", { code: item.statusCode })}\n`; | ||
| } | ||
| timeline += `${t("timeline.error", { error: item.errorMessage || t("timeline.unknown") })}\n`; | ||
|
|
||
| if (item.errorDetails?.request) { | ||
| timeline += formatRequestDetails(item.errorDetails.request, t); | ||
| } | ||
| } | ||
|
|
||
| timeline += `\n${t("timeline.vendorTypeAllTimeoutNote")}`; | ||
| continue; | ||
| } |
There was a problem hiding this comment.
This block for handling vendor_type_all_timeout seems to duplicate a lot of the logic for formatting provider error details that likely exists for other error reasons like retry_failed. To improve maintainability and reduce code duplication, consider extracting the common logic for rendering provider error details (provider name, status code, error message, duration, upstream body, request details) into a shared helper function.
| const { getEnvConfig } = await import("@/lib/config/env.schema"); | ||
| if (!getEnvConfig().ENABLE_ENDPOINT_CIRCUIT_BREAKER) { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
To improve maintainability and reduce code repetition, consider refactoring the repeated dynamic import and check for ENABLE_ENDPOINT_CIRCUIT_BREAKER. This pattern appears in isEndpointCircuitOpen, recordEndpointFailure, recordEndpointSuccess, and triggerEndpointCircuitBreakerAlert in this file, as well as in src/lib/provider-endpoints/endpoint-selector.ts. You could cache the configuration promise at the module level to avoid multiple import() calls.
For example:
// At module scope
const envConfigPromise = import("@/lib/config/env.schema").then(m => m.getEnvConfig());
async function isEndpointCbEnabled(): Promise<boolean> {
return (await envConfigPromise).ENABLE_ENDPOINT_CIRCUIT_BREAKER;
}
// In your functions
export async function isEndpointCircuitOpen(endpointId: number): Promise<boolean> {
if (!(await isEndpointCbEnabled())) {
return false;
}
// ...
}| scan: vi | ||
| .fn() | ||
| .mockResolvedValueOnce([ | ||
| "0", | ||
| ["endpoint_circuit_breaker:state:1", "endpoint_circuit_breaker:state:2"], | ||
| ]), | ||
| del: vi.fn(async () => {}), | ||
| }; |
There was a problem hiding this comment.
This test for initEndpointCircuitBreaker is a great start. To make it more robust, consider enhancing the redis.scan mock to test the pagination logic in the do...while loop. Currently, it only covers a single scan call because the mocked cursor is immediately '0'. You could chain mockResolvedValueOnce to simulate multiple pages of keys.
Example of a multi-page scan mock:
const redisMock = {
scan: vi.fn()
.mockResolvedValueOnce(["1", ["key1", "key2"]]) // Page 1
.mockResolvedValueOnce(["0", ["key3"]]), // Page 2 (last page)
del: vi.fn().mockResolvedValue(1),
};There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/lib/endpoint-circuit-breaker.ts (1)
109-114:⚠️ Potential issue | 🟡 Minor
getEndpointHealthInfo和resetEndpointCircuit缺少特性开关保护需要说明设计意图其他四个公开函数(
isEndpointCircuitOpen、recordEndpointFailure、recordEndpointSuccess、triggerEndpointCircuitBreakerAlert)均添加了ENABLE_ENDPOINT_CIRCUIT_BREAKER守卫,但这两个函数没有。如果此设计是为了即使在特性关闭时也允许查看或重置状态,建议添加注释说明,特别是resetEndpointCircuit在probe.ts中被调用且不检查特性开关。
🤖 Fix all issues with AI agents
In `@messages/ru/provider-chain.json`:
- Around line 72-77: The translation for the key rate_limited is inconsistent:
update filterDetails.rate_limited (in messages/ru/provider-chain.json) to match
filterReasons.rate_limited by replacing "Ограничение стоимости" with the correct
"Ограничение скорости" so both entries use the same "rate limit" wording; verify
no other duplicate keys (e.g., filterDetails.rate_limited vs
filterReasons.rate_limited) remain inconsistent.
- Around line 41-42: The Russian translation for the key "endpointPoolExhausted"
is grammatically incorrect ("Пул конечная точкаов"); update the value for
"endpointPoolExhausted" to use the correct genitive plural form "Пул конечных
точек исчерпан" so it matches other occurrences (e.g., the form used for
"vendorTypeAllTimeout"); ensure only the string value for the symbol
"endpointPoolExhausted" is replaced.
- Around line 208-210: The value for the "strictBlockSelectorError" message
contains a Russian typo "конечная точкаов"; update the translation to correct
Russian grammar (e.g., replace with "ошибка селектора конечных точек") so the
full string reads like "Строгий режим: ошибка селектора конечных точек,
провайдер пропущен без отката". Also scan for and fix other identical typos
elsewhere (keys around the file that currently contain "конечная
точкаов"/"конечная точкаы"/"конечная точкаа") in follow-up changes.
- Around line 54-55: The translation for the JSON key "endpoint_pool_exhausted"
contains a typo ("Пул конечная точкаов исчерпан"); update the value for
endpoint_pool_exhausted to the correct Russian phrase "Пул конечных точек
исчерпан" (leave vendor_type_all_timeout unchanged) so the file's entries for
"endpoint_pool_exhausted" and "vendor_type_all_timeout" are accurate.
In `@messages/zh-CN/provider-chain.json`:
- Around line 72-77: The translation for filterDetails.rate_limited is
inconsistent with filterReasons.rate_limited; update the value of
filterDetails.rate_limited to "速率限制" to match filterReasons.rate_limited, or if
the meanings differ, rename one of the keys (e.g., filterDetails.fee_limited)
and provide an appropriate translation to avoid ambiguity; locate these keys in
the JSON (filterDetails.rate_limited and filterReasons.rate_limited) and make
the translations/keys consistent.
In `@messages/zh-TW/provider-chain.json`:
- Around line 72-77: filterDetails.rate_limited 的中文翻译错误,将当前值 "費用限制" 改为与同文件中
filterReasons.rate_limited 一致的 "速率限制";定位并更新键名 filterDetails.rate_limited
的值,确保与其他相同语义的键(例如 filterReasons.rate_limited)保持一致。
In `@src/instrumentation.ts`:
- Around line 352-360: The development branch is missing the same endpoint
circuit breaker initialization used in production; add an await import and call
to initEndpointCircuitBreaker() inside the dev-environment block (the same
pattern used for startEndpointProbeScheduler and startEndpointProbeLogCleanup)
and wrap it in the same try/catch that logs via logger.warn using error
instanceof Error ? error.message : String(error) so the dev path cleans up Redis
keys and mirrors production behavior.
🧹 Nitpick comments (5)
src/lib/provider-endpoints/endpoint-selector.ts (1)
44-48: 建议将getEnvConfig改为顶层导入。
getPreferredProviderEndpoints是每次请求都会调用的热路径函数。虽然await import(...)在首次加载后会被运行时缓存,但此文件已标记为"server-only",可以安全地在顶层导入getEnvConfig,避免不必要的异步开销,且与顶层已有的isEndpointCircuitOpen导入风格一致。建议的重构
import "server-only"; import { isEndpointCircuitOpen } from "@/lib/endpoint-circuit-breaker"; +import { getEnvConfig } from "@/lib/config/env.schema"; import { findProviderEndpointsByVendorAndType } from "@/repository"; import type { ProviderEndpoint, ProviderType } from "@/types/provider";然后在两处使用中去掉动态导入:
- const { getEnvConfig } = await import("@/lib/config/env.schema"); if (!getEnvConfig().ENABLE_ENDPOINT_CIRCUIT_BREAKER) {src/lib/endpoint-circuit-breaker.ts (1)
117-120: 重复的动态导入模式可考虑提取为内部辅助函数。五个被保护的函数中都重复了相同的动态导入和特性开关检查逻辑。可考虑提取为一个内部辅助函数以减少重复。
♻️ 建议的重构方案
+async function isEndpointCBEnabled(): Promise<boolean> { + const { getEnvConfig } = await import("@/lib/config/env.schema"); + return getEnvConfig().ENABLE_ENDPOINT_CIRCUIT_BREAKER; +} + export async function isEndpointCircuitOpen(endpointId: number): Promise<boolean> { - const { getEnvConfig } = await import("@/lib/config/env.schema"); - if (!getEnvConfig().ENABLE_ENDPOINT_CIRCUIT_BREAKER) { + if (!(await isEndpointCBEnabled())) { return false; } // ...Also applies to: 143-146, 191-194, 258-261, 312-315
tests/unit/lib/provider-endpoints/endpoint-selector.test.ts (1)
27-27: 测试名称风格不一致:已有用例使用中文,新增用例使用英文。已有的测试描述使用中文(如第 27、91 行),而新增的测试描述使用英文(如第 264、302 行)。建议统一风格以提高可读性和一致性。
Also applies to: 91-91, 264-264, 302-302
src/lib/utils/provider-chain-formatter.ts (1)
415-420:t()被调用了两次,可简化为一次。
t(\filterDetails.${f.details}`)` 被调用了两次:第一次用于判断翻译是否存在,第二次用于获取翻译值。可以用临时变量避免重复调用。♻️ 建议的优化
- const detailsText = f.details - ? t(`filterDetails.${f.details}`) !== `filterDetails.${f.details}` - ? t(`filterDetails.${f.details}`) - : f.details - : f.reason; + let detailsText: string; + if (f.details) { + const key = `filterDetails.${f.details}`; + const translated = t(key); + detailsText = translated !== key ? translated : f.details; + } else { + detailsText = f.reason; + }tests/unit/lib/endpoint-circuit-breaker.test.ts (1)
80-82: fake timers 下动态 import 缓存预热的处理方式值得注意。在
vi.useFakeTimers()环境下,对await import("@/lib/config/env.schema")进行预热以确保模块缓存命中是必要的 workaround。注释说明清晰,有助于后续维护者理解。但这也暗示了生产代码中每次调用都执行动态导入可能引入的微妙时序依赖——进一步支持在源码中提取isEndpointCBEnabled()辅助函数的建议。Also applies to: 284-288
There was a problem hiding this comment.
Code Review Summary
No significant issues identified in this PR.
PR Size: L
- Lines changed: 691
- Files changed: 22
- Split suggestions:
- Extract endpoint circuit breaker gating + init cleanup + env var/docs/tests.
- Extract 524 decision-chain recording + new
vendor_type_all_timeoutreason + formatter/UI rendering. - Extract i18n message updates (
messages/*/provider-chain.json) into a dedicated PR.
Review Coverage
- Logic and correctness - No issues found
- Security (OWASP Top 10) - No issues found
- Error handling - No issues found
- Type safety - No issues found
- Documentation accuracy - No issues found
- Test coverage - Adequate
- Code clarity - No issues found
Automated review by Codex AI
There was a problem hiding this comment.
- Reviewed PR
#773(headfeat/endpoint-cb-default-off-and-524-chain→ basedev) using the diff-only + multi-perspective criteria. - Classified PR size as L (
+657/-34, 22 files) and applied labelsize/L. - No issues met the report threshold, so no inline review comments were posted; submitted the required summary review comment to the PR (including split suggestions).
There was a problem hiding this comment.
Code Review Summary
This PR makes two solid operational improvements: gating the endpoint circuit breaker behind an opt-in env var and closing the 524 timeout visibility gap in the decision chain. The i18n fix for hardcoded Chinese strings is a welcome cleanup. One minor logic issue found in the i18n fallback pattern.
PR Size: L
- Lines changed: 691 (657 additions, 34 deletions)
- Files changed: 22
Issues Found
| Category | Critical | High | Medium | Low |
|---|---|---|---|---|
| Logic/Bugs | 0 | 0 | 1 | 0 |
| Security | 0 | 0 | 0 | 0 |
| Error Handling | 0 | 0 | 0 | 0 |
| Types | 0 | 0 | 0 | 0 |
| Comments/Docs | 0 | 0 | 0 | 0 |
| Tests | 0 | 0 | 0 | 0 |
| Simplification | 0 | 0 | 0 | 0 |
High Priority Issues (Should Fix)
provider-chain-formatter.ts:416- i18n fallback checkt(key) \!== keydoes not account for the project'sgetMessageFallbackwhich returnsnamespace.key(see inline comment). Low practical impact since all current keys exist, but the fallback path is broken for any future unknown detail values.
Review Coverage
- Logic and correctness
- Security (OWASP Top 10) - Clean
- Error handling - Clean
- Type safety - Clean
- Documentation accuracy - Clean
- Test coverage - Adequate (396 new test lines covering CB disable gating, selector bypass, and formatter)
- Code clarity - Good
Automated review by Claude AI
| const icon = f.reason === "circuit_open" ? "⚡" : "💰"; | ||
| timeline += ` ${icon} ${f.name} (${f.details || f.reason})\n`; | ||
| const detailsText = f.details | ||
| ? t(`filterDetails.${f.details}`) !== `filterDetails.${f.details}` |
There was a problem hiding this comment.
[Medium] [LOGIC-BUG] i18n fallback check t(key) !== key is incorrect given the project's getMessageFallback configuration
Why this is a problem: In src/i18n/request.ts:40-42, the getMessageFallback is configured to return ${namespace}.${key} (e.g., provider-chain.filterDetails.circuit_open). However, this fallback check compares against the key without the namespace prefix (filterDetails.circuit_open). Since the fallback return value will always differ from the bare key, this condition will always be truthy, and the raw f.details fallback will never be reached for unknown keys.
In contrast, LogicTraceTab.tsx correctly uses tChain.has() which is the proper next-intl API for key existence checks.
Currently all 4 filterDetails keys exist in all 5 language files, so this only affects future unknown detail values. When triggered, the UI would display the namespace-prefixed key (e.g., provider-chain.filterDetails.unknown_key) instead of the raw value.
Suggested fix:
const detailsText = f.details
? (() => {
const translated = t(`filterDetails.${f.details}`);
// getMessageFallback returns namespace.key for missing keys,
// so check if the result ends with the lookup key
return translated.endsWith(`filterDetails.${f.details}`)
? f.details
: translated;
})()
: f.reason;Alternatively, consider passing a has-like capability to the formatter, or simply accept the current known keys are covered and add a comment documenting this limitation.
- Fix Russian: "конечная точкаов" -> "конечных точек" (11 occurrences) - Fix Russian: "Ограничение стоимости" -> "Ограничение скорости" (rate_limited) - Fix zh-CN: "费用限制" -> "速率限制" (filterDetails.rate_limited) - Fix zh-TW: "費用限制" -> "速率限制" (filterDetails.rate_limited) - Add initEndpointCircuitBreaker() to dev environment in instrumentation.ts
🧪 测试结果
总体结果: ✅ 所有测试通过 |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@messages/ru/provider-chain.json`:
- Line 42: The string for description.vendorTypeAllTimeout is missing the "типа
поставщика" qualifier and should match other entries
(reasons.vendor_type_all_timeout and timeline.vendorTypeAllTimeout); update the
value of description.vendorTypeAllTimeout to include "типа поставщика" so it
reads the same as the other locales and entries (e.g., "Тайм-аут всех конечных
точек типа поставщика").
| "clientError": "Ошибка клиента", | ||
| "endpointPoolExhausted": "Пул конечная точкаов исчерпан" | ||
| "endpointPoolExhausted": "Пул конечных точек исчерпан", | ||
| "vendorTypeAllTimeout": "Тайм-аут всех конечных точек" |
There was a problem hiding this comment.
description.vendorTypeAllTimeout 缺少"типа поставщика"(供应商类型),与其他 locale 不一致。
对比同文件第 55 行 reasons.vendor_type_all_timeout("Тайм-аут всех конечных точек типа поставщика")和第 209 行 timeline.vendorTypeAllTimeout(同样包含 "типа поставщика"),以及 zh-TW 第 42 行("供應商類型全端點逾時"),此处第 42 行的 description.vendorTypeAllTimeout 仅为 "Тайм-аут всех конечных точек",缺少"供应商类型"的限定。
建议修复
- "vendorTypeAllTimeout": "Тайм-аут всех конечных точек"
+ "vendorTypeAllTimeout": "Тайм-аут всех конечных точек типа поставщика"🤖 Prompt for AI Agents
In `@messages/ru/provider-chain.json` at line 42, The string for
description.vendorTypeAllTimeout is missing the "типа поставщика" qualifier and
should match other entries (reasons.vendor_type_all_timeout and
timeline.vendorTypeAllTimeout); update the value of
description.vendorTypeAllTimeout to include "типа поставщика" so it reads the
same as the other locales and entries (e.g., "Тайм-аут всех конечных точек типа
поставщика").
* fix(circuit-breaker): key errors should not trip endpoint circuit breaker Remove 3 recordEndpointFailure calls from response-handler streaming error paths (fake-200, non-200 HTTP, stream abort). These are key-level errors where the endpoint itself responded successfully. Only forwarder-level failures (timeout, network error) and probe failures should penalize the endpoint circuit breaker. Previously, a single bad API key could trip the endpoint breaker (threshold=3, open=5min), making ALL keys on that endpoint unavailable. * chore: format code (dev-3d584e5) * Merge pull request #767 from ding113/fix/provider-clone-deep-copy fix: 修复供应商克隆时因浅拷贝引用共享导致源供应商数据被意外污染的问题 * 增强配置表单输入警告提示 (#768) * feat: 增强配置表单输入警告提示 * fix: 修复 expiresAt 显示与配额刷新输入边界 * fix: 修复 expiresAt 解析兜底并改善刷新间隔输入体验 * fix: 刷新间隔输入取整并复用 clamp --------- Co-authored-by: tesgth032 <tesgth032@users.noreply.github.com> * feat(circuit-breaker): endpoint CB default-off + 524 decision chain audit (#773) * feat(circuit-breaker): endpoint circuit breaker default-off + 524 decision chain audit - Add ENABLE_ENDPOINT_CIRCUIT_BREAKER env var (default: false) to gate endpoint-level circuit breaker - Gate isEndpointCircuitOpen, recordEndpointFailure, recordEndpointSuccess, triggerEndpointCircuitBreakerAlert behind env switch - Add initEndpointCircuitBreaker() startup cleanup: clear stale Redis keys when feature disabled - Gate endpoint filtering in endpoint-selector (getPreferredProviderEndpoints, getEndpointFilterStats) - Fix 524 vendor-type timeout missing from decision chain: add chain entry with reason=vendor_type_all_timeout in forwarder - Add vendor_type_all_timeout to ProviderChainItem reason union type (both backend session.ts and frontend message.ts) - Add timeline rendering for vendor_type_all_timeout in provider-chain-formatter - Replace hardcoded Chinese strings in provider-selector circuit_open details with i18n keys - Add i18n translations for vendor_type_all_timeout and filterDetails (5 languages: zh-CN, zh-TW, en, ja, ru) - Enhance LogicTraceTab to render filterDetails via i18n lookup with fallback - Add endpoint_pool_exhausted and vendor_type_all_timeout to provider-chain-popover isActualRequest/getItemStatus - Add comprehensive unit tests for all changes (endpoint-circuit-breaker, endpoint-selector, provider-chain-formatter) * fix(i18n): fix Russian grammar errors and rate_limited translations - Fix Russian: "конечная точкаов" -> "конечных точек" (11 occurrences) - Fix Russian: "Ограничение стоимости" -> "Ограничение скорости" (rate_limited) - Fix zh-CN: "费用限制" -> "速率限制" (filterDetails.rate_limited) - Fix zh-TW: "費用限制" -> "速率限制" (filterDetails.rate_limited) - Add initEndpointCircuitBreaker() to dev environment in instrumentation.ts * fix(circuit-breaker): vendor type CB respects ENABLE_ENDPOINT_CIRCUIT_BREAKER Make vendor type circuit breaker controlled by the same ENABLE_ENDPOINT_CIRCUIT_BREAKER switch as endpoint circuit breaker. When disabled (default), vendor type CB will never trip or block providers, resolving user confusion about "vendor type temporary circuit breaker" skip reasons in decision chain. Changes: - Add ENABLE_ENDPOINT_CIRCUIT_BREAKER check in isVendorTypeCircuitOpen() - Add switch check in recordVendorTypeAllEndpointsTimeout() - Add tests for switch on/off behavior Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * 修复 Key 并发限制继承用户并发上限 (#772) * fix: Key 并发上限默认继承用户限制 - RateLimitGuard:Key limitConcurrentSessions=0 时回退到 User limitConcurrentSessions\n- Key 配额/使用量接口:并发上限按有效值展示\n- 单测覆盖并发继承逻辑;补齐 probe 测试的 endpoint-circuit-breaker mock 导出\n- 同步更新 biome.json schema 版本以匹配当前 Biome CLI * docs: 补齐并发上限解析工具注释 * refactor: 合并 Key 限额查询并补充并发单测 - getKeyQuotaUsage/getKeyLimitUsage:通过 leftJoin 一次取回 User 并发上限,避免额外查询\n- 新增 resolveKeyConcurrentSessionLimit 单测,覆盖关键分支\n- 修复 vacuum-filter bench 中的 Biome 报错 * fix: my-usage 并发上限继承用户限制 - getMyQuota:Key 并发为 0/null 时回退到 User 并发上限,保持与 Guard/Key 配额一致\n- 新增单测覆盖 Key->User 并发继承 * test: 补齐 my-usage 并发继承场景 - MyUsageQuota.keyLimitConcurrentSessions 收敛为 number(0 表示无限制)\n- OpenAPI 响应 schema 同步为非 nullable\n- my-usage 并发继承测试补充 Key>0 与 User=0 场景 --------- Co-authored-by: tesgth032 <tesgth032@users.noreply.github.com> --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: hank9999 <hank9999@qq.com> Co-authored-by: tesgth032 <tesgth032@hotmail.com> Co-authored-by: tesgth032 <tesgth032@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Summary
Two operational improvements to the endpoint circuit breaker system: make it opt-in (default OFF) via a new env var, and fix a gap where 524 vendor-type timeouts were invisible in the decision chain.
Problem
Endpoint circuit breaker too aggressive by default: After feat(circuit): unify provider-endpoint circuit visibility and notifications #755 unified endpoint circuit visibility, the endpoint-level CB was always active. In environments with unstable networks, this caused endpoints to be silently blocked, confusing operators who saw "enabled" endpoints that weren't actually serving traffic (similar to the pattern reported in bug:最新版本5.4,如果把供应商管理里面的端点监测关掉,直接全部报错无法请求 #732).
524 timeout invisible in decision chain: When all endpoints for a vendor-type timed out (HTTP 524), the forwarder triggered the vendor-type circuit breaker but never called
addProviderToChain, making the timeout completely invisible in logs and the UI timeline. This was part of the broader observability gap described in 端点熔断在 UI 和决策链中的可见性不足 #754 and logic and display of provider errors #760.Hardcoded Chinese strings in filter details:
provider-selector.tscontained hardcoded Chinese strings for circuit breaker states ("供应商类型临时熔断","熔断器打开", etc.) instead of i18n keys, breaking display for non-Chinese locales.Related Issues/PRs:
endpoint_pool_exhaustedand new reason renderingSolution
1. Endpoint CB default-off (
ENABLE_ENDPOINT_CIRCUIT_BREAKER)ENABLE_ENDPOINT_CIRCUIT_BREAKER(default:false) gates all endpoint-level CB functionsisEndpointCircuitOpenreturnsfalse,recordEndpointFailure/recordEndpointSuccess/triggerAlertare no-opsendpoint-selector.tsskips circuit checks entirely when disabled, returning all enabled endpointsinitEndpointCircuitBreaker) clears stale Redis keys when the feature is disabled, preventing leftover open states from blocking endpoints after toggling off2. 524 decision chain audit
forwarder.ts: callssession.addProviderToChain()withreason: "vendor_type_all_timeout"BEFORE triggering the vendor-type circuit breakervendor_type_all_timeoutadded to theProviderChainItemtype unionprovider-chain-formatter.ts: full timeline block rendering for the new reason (provider, status code, error details, note)provider-chain-popover.tsx: recognizesvendor_type_all_timeoutandendpoint_pool_exhaustedas actual requests with proper status icons3. i18n fix for filter details
provider-selector.tswith i18n keys (vendor_type_circuit_open,circuit_open,circuit_half_open,rate_limited)filterDetailssection to all 5 language filesLogicTraceTab.tsxandprovider-chain-formatter.tsnow resolve filter detail keys through i18n with fallback to raw valueChanges (22 files, +657/-34)
env.schema.ts,.env.example,deploy.sh/ps1ENABLE_ENDPOINT_CIRCUIT_BREAKERenv varendpoint-circuit-breaker.ts,endpoint-selector.tsforwarder.ts,session.ts,provider-selector.tsLogicTraceTab.tsx,provider-chain-popover.tsxprovider-chain-formatter.tsvendor_type_all_timeoutmessage.tsvendor_type_all_timeoutto reason unionmessages/{zh-CN,zh-TW,en,ja,ru}/provider-chain.jsoninstrumentation.tsinitEndpointCircuitBreaker()on bootTesting
Automated Tests
initEndpointCircuitBreaker(Redis cleanup when disabled, no-op when enabled)vendor_type_all_timeoutformatter (with and without error details)ENABLE_ENDPOINT_CIRCUIT_BREAKER=trueManual Testing
ENABLE_ENDPOINT_CIRCUIT_BREAKER=false(default) -> endpoint CB inactive, all enabled endpoints availableENABLE_ENDPOINT_CIRCUIT_BREAKER=true-> endpoint CB active, normal circuit breaker behaviorvendor_type_all_timeoutin UI timelinePre-commit
bun run typecheck- passbun run test- 2221 passed, 0 failedbun run lint- cleanbun run build- successChecklist
false, preserving current behavior)Description enhanced by Claude AI
Greptile Overview
Greptile Summary
This PR makes the endpoint circuit breaker opt-in (default OFF) via
ENABLE_ENDPOINT_CIRCUIT_BREAKER, fixes a critical observability gap where 524 timeouts were invisible in the decision chain, and replaces hardcoded Chinese strings with i18n keys.Key Changes:
isEndpointCircuitOpenreturns false, recording functions become no-ops, and startup cleanup removes stale Redis keysforwarder.tsnow callssession.addProviderToChain()withvendor_type_all_timeoutreason BEFORE triggering the vendor-type circuit breaker, making timeouts visible in logs and UI"供应商类型临时熔断"with i18n keys (vendor_type_circuit_open,circuit_open, etc.) inprovider-selector.ts, with proper fallback rendering in UI componentsQuality:
Confidence Score: 5/5
Important Files Changed
Sequence Diagram
sequenceDiagram participant Client participant Forwarder participant EndpointSelector participant EndpointCB as Endpoint Circuit Breaker participant Session participant VendorTypeCB as Vendor-Type Circuit Breaker participant Redis Note over EndpointCB,Redis: Startup: initEndpointCircuitBreaker() EndpointCB->>EndpointCB: Check ENABLE_ENDPOINT_CIRCUIT_BREAKER alt Feature Disabled EndpointCB->>Redis: SCAN endpoint_circuit_breaker:state:* Redis-->>EndpointCB: Return stale keys EndpointCB->>Redis: DEL stale keys Note over EndpointCB: Clear in-memory state end Client->>Forwarder: POST /v1/messages Forwarder->>EndpointSelector: getPreferredProviderEndpoints() alt ENABLE_ENDPOINT_CIRCUIT_BREAKER=false EndpointSelector-->>Forwarder: Return all enabled endpoints (skip circuit check) else ENABLE_ENDPOINT_CIRCUIT_BREAKER=true EndpointSelector->>EndpointCB: isEndpointCircuitOpen(endpointId) EndpointCB->>Redis: Load circuit state Redis-->>EndpointCB: State EndpointCB-->>EndpointSelector: true/false EndpointSelector-->>Forwarder: Return available endpoints end Forwarder->>Forwarder: Attempt all endpoints Note over Forwarder: All endpoints timeout (524) alt All endpoints for vendor-type timed out Forwarder->>Session: addProviderToChain(reason: vendor_type_all_timeout) Note over Session: Record decision with full error details Forwarder->>VendorTypeCB: recordVendorTypeAllEndpointsTimeout() Note over VendorTypeCB: Trigger vendor-type circuit breaker end Forwarder-->>Client: Error response with decision chain