Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthrough本次变更引入端点探针与按厂商/类型的批量统计/日志接口,支持按 ID 批量读取并同步电路健康状态(含 Redis 批量加载与状态转移);增加视口首次进入懒加载钩子;服务器端新增 slim usage-log 与 quota 聚合;前端改为精确的 react-query 缓存失效与按需加载。 Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @tesgth032, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! 此拉取请求旨在全面优化 Providers 功能的性能和稳定性。通过解决 N+1 查询问题、减少重复请求、引入批量数据加载机制以及细化前端渲染策略,显著提升了用户界面的响应速度和后端服务的负载能力。同时,对数据库查询进行了深度优化,确保了系统在数据规模增长时的可扩展性和健壮性,从而为用户提供更流畅、更可靠的 Providers 管理体验。 Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request primarily focuses on optimizing data fetching and rendering performance, particularly for provider endpoint information and usage logs. Key changes include refactoring src/actions/my-usage.ts to use a new findUsageLogsForKeySlim function and a more efficient getUserConcurrentSessions method, and removing a deprecated sumUserCost function. A new src/repository/provider-endpoints-batch.ts file was introduced to provide batch fetching capabilities for provider endpoint probe logs and vendor-type endpoint statistics, which are then utilized by new Server Actions in src/actions/provider-endpoints.ts. The batchGetEndpointCircuitInfo action was updated to use a new getAllEndpointHealthStatusAsync function for batch retrieval of circuit breaker states from Redis, improving efficiency. Client-side components like EndpointLatencySparkline, ProviderEndpointHover, and ProviderEndpointsTable were updated to leverage these new batching actions and introduce a useInViewOnce hook for deferred loading of data when elements enter the viewport, preventing request storms. Additionally, router.refresh() calls were replaced with more granular queryClient.invalidateQueries() calls across several UI components for better cache management. The getProviderStatistics function in src/repository/provider.ts was optimized with a new SQL query structure and an in-memory cache, and src/repository/statistics.ts introduced a cache for keyId to keyString lookups to reduce database queries. Review comments highlighted critical security vulnerabilities due to the insecure exposure of repository functions as public Server Actions in src/repository/provider-endpoints-batch.ts, src/actions/my-usage.ts (findUsageLogsForKeySlim), and src/repository/provider.ts (getProviderStatistics), which lack proper authentication and authorization. Other feedback included concerns about hardcoded API paths, inefficient useEffect dependencies, and potential cache-busting issues with the keyStringByIdCache implementation.
| @@ -0,0 +1,163 @@ | |||
| "use server"; | |||
There was a problem hiding this comment.
The use of the "use server" directive in repository files is highly insecure. This directive exports all functions in the file as public Server Actions, which can be called directly from the client. Since these repository functions (e.g., findVendorTypeEndpointStatsBatch, findProviderEndpointProbeLogsBatch) do not perform any authentication or authorization checks, they are exposed as public endpoints, allowing any user to access potentially sensitive data. Repository functions should be internal helper functions called only by authorized Server Actions located in the src/actions/ directory.
| cacheTtlApplied: string | null; | ||
| } | ||
|
|
||
| export async function findUsageLogsForKeySlim( |
There was a problem hiding this comment.
The function findUsageLogsForKeySlim is exported from a file with the "use server" directive, making it a public Server Action. It accepts a keyString parameter and returns usage logs for that key without verifying if the authenticated user is authorized to view them. This creates an Insecure Direct Object Reference (IDOR) vulnerability, where any user can access usage logs for any API key by calling this action directly. You should remove the "use server" directive from this file and ensure that usage logs are only accessible through authorized actions that verify ownership (e.g., getMyUsageLogs in src/actions/my-usage.ts).
| data: ProviderStatisticsRow[]; | ||
| } | null = null; | ||
|
|
||
| export async function getProviderStatistics(): Promise<ProviderStatisticsRow[]> { |
There was a problem hiding this comment.
The function getProviderStatistics is exported from a file with the "use server" directive, making it a public Server Action. It returns statistics for all providers without any authorization checks. This allows any user to access potentially sensitive provider performance and cost data. Repository functions should not be exposed as Server Actions; instead, they should be called by authorized actions in the src/actions/ directory that perform role-based access control (e.g., checking for admin role).
| const merged: Record<number, ProbeLog[]> = {}; | ||
|
|
||
| for (const chunk of chunks) { | ||
| const res = await fetch("/api/actions/providers/batchGetProviderEndpointProbeLogs", { |
src/lib/hooks/use-in-view-once.ts
Outdated
|
|
||
| observer.observe(el); | ||
| return () => observer.disconnect(); | ||
| }, [isInView, options]); |
| if (keyStringByIdCache.size >= KEY_STRING_BY_ID_CACHE_MAX_SIZE) { | ||
| keyStringByIdCache.clear(); | ||
| } |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/repository/provider.ts`:
- Around line 900-901: The comment containing an emoji should be cleaned: remove
the "⭐" character from the comment that explains using the last item of
providerChain (provider_chain) to determine the final provider and falling back
to provider_id when provider_chain is empty; keep the rest of the Chinese text
intact and ensure references to providerChain, provider_chain, and provider_id
remain unchanged.
- Around line 976-983: The code currently casts the db.execute() result directly
to ProviderStatisticsRow[] (using "const data = result as unknown as
ProviderStatisticsRow[]"), but db.execute() from postgres-js/drizzle-orm returns
an iterable and other call sites use Array.from(result); replace the direct type
assertion by converting the result with Array.from(result) and then type it as
ProviderStatisticsRow[] before storing it in providerStatisticsCache and
returning it so the handling matches the other db.execute() usages.
🧹 Nitpick comments (18)
src/app/[locale]/settings/providers/_components/vendor-keys-compact-list.tsx (1)
468-474: 编辑成功后失效 health 和 statistics 缓存,逻辑正确。编辑可能变更 key、type 等核心字段,同时刷新 health 和 statistics 是合理的。
一个小的一致性提示:
onSuccess(创建流程,Line 121-125)目前只失效了"providers"和"provider-vendors",未包含"providers-health"/"providers-statistics"。虽然新建的 provider 尚无健康检测/统计数据,影响有限,但如果后续需要在创建后立即显示健康状态,可能需要补齐。src/lib/hooks/use-in-view-once.ts (1)
36-36:options作为 effect 依赖会导致观察者在可见前反复重建。如果调用方每次渲染传入新对象字面量(如
useInViewOnce({ rootMargin: "100px" })),由于引用不等,effect 会反复执行,在元素可见前不断创建/销毁 IntersectionObserver。建议对
options做序列化或让调用方自行useMemo,或者在 hook 内部做浅比较缓存:可选的稳定化方案
+import { useEffect, useRef, useState, useMemo } from "react"; + export function useInViewOnce<T extends Element>(options?: IntersectionObserverInit) { const ref = useRef<T | null>(null); const [isInView, setIsInView] = useState(false); + const stableOptions = useRef(options); + // shallow compare to avoid re-triggering + if ( + options?.root !== stableOptions.current?.root || + options?.rootMargin !== stableOptions.current?.rootMargin || + options?.threshold !== stableOptions.current?.threshold + ) { + stableOptions.current = options; + } useEffect(() => { // ... - }, [isInView, options]); + }, [isInView, stableOptions.current]);src/repository/usage-logs.ts (1)
284-301:UsageLogSlimFilters和UsageLogSlimRow未导出,可能影响外部类型引用。目前这两个接口只用
interface声明,没有export。如果有外部调用方需要引用过滤器或返回行的类型(例如在 mock 或测试中显式标注类型),将无法直接导入。当前
my-usage.ts传入内联对象、测试通过 mock 绕过,所以暂时无问题。后续如果需要在其他地方引用这些类型,建议加上export。src/app/[locale]/settings/providers/_components/provider-endpoint-hover.tsx (2)
39-41: 模块级 deferred Map 在组件卸载后可能持有过期引用。如果一个组件在
flushVendorTypeEndpointStats执行前卸载,其 deferred promise 的resolve/reject仍留在deferredByProviderTypeVendorId中。虽然 resolve 一个已卸载组件的 promise 不会导致崩溃(React Query 会忽略已取消的查询),但 deferred 条目不会被清理,可能在极端快速挂载/卸载场景下累积。当前的
setTimeout(fn, 0)窗口极短,实际风险很低。仅作为防御性提醒。
260-267: 批量获取熔断状态时res.data可能为undefined。当
res.ok为true时,res.data在类型层面可能仍是T | undefined(取决于ActionResult的类型定义)。如果实际不可能出现undefined,建议加?? []做防御:- const res = await batchGetEndpointCircuitInfo({ endpointIds: chunk }); - return res.ok ? res.data : []; + const res = await batchGetEndpointCircuitInfo({ endpointIds: chunk }); + return res.ok ? (res.data ?? []) : [];src/actions/provider-endpoints.ts (1)
795-810:forceRefresh: true每次触发 Redis 读取,注意高频场景。
batchGetEndpointCircuitInfo使用forceRefresh: true绕过内存缓存直接读 Redis,适合管理页需要实时熔断状态的场景。但如果多个 tooltip 同时打开或短时间内反复打开/关闭,可能产生较多 Redis 请求。当前
staleTime: 1000 * 10(10 秒)在 React Query 层提供了一定缓冲。如果后续发现 Redis 压力,可考虑将forceRefresh改为仅在首次加载或用户手动刷新时启用。src/repository/provider-endpoints-batch.ts (2)
83-111: 原始 SQL 中硬编码了表名provider_endpoint_probe_logs。第 106 行直接使用字符串表名而非 Drizzle schema 引用。如果表名在 schema 定义中变更,此处不会同步更新。建议通过 Drizzle 的 schema 对象获取表名,或至少添加注释说明此处与 schema 的对应关系。
根据编码规范,repository 层应使用 Drizzle ORM 进行数据访问。此处使用原始 SQL 实现
ROW_NUMBER()窗口函数是合理的,但可以考虑引用 schema 中的表名以提高可维护性。建议引用 schema 表名
在文件顶部导入 probe logs 表的 schema 定义(如果存在),然后通过 Drizzle 的
getTableName()或类似机制引用表名:+import { providerEndpointProbeLogs } from "@/drizzle/schema"; +import { getTableName } from "drizzle-orm";然后在 SQL 中使用:
- FROM provider_endpoint_probe_logs + FROM ${sql.identifier(getTableName(providerEndpointProbeLogs))}如果 Drizzle 版本不支持
getTableName,可退而求其次添加注释:// Table name must match: providerEndpointProbeLogs in `@/drizzle/schema` FROM provider_endpoint_probe_logsAs per coding guidelines: "Use Drizzle ORM for data access in the repository layer".
9-13:toDate静默回退到new Date()可能掩盖数据异常。当
value既不是Date、string也不是number时,返回当前时间。这意味着如果数据库返回了意外类型(例如null),日志的createdAt会被静默替换为当前时间,而不是抛出或记录警告。对于探测日志的
createdAt字段,时间准确性较为重要。建议至少记录 debug 日志,或在调用侧对 null 值进行预处理。src/lib/endpoint-circuit-breaker.ts (1)
149-161: Redis 状态同步仅在circuitState变化时更新内存。Line 151 的条件
redisState.circuitState !== existingHealth.circuitState意味着如果circuitState相同但failureCount不同(例如另一个实例累积了更多失败),内存中的failureCount不会从 Redis 同步。对于管理页面展示场景,这可能导致不同实例上显示的failureCount不一致。当前实现与
getOrCreateHealth(line 62)保持了一致,且forceRefresh会先清除loadedFromRedis条目使得下次needsRefresh判断为 true,但由于 line 151 的条件限制,即使 forceRefresh 也不会更新 same-state 的 failureCount。如果多实例场景下展示准确的
failureCount是预期需求,建议放宽为:- if (!existingHealth || redisState.circuitState !== existingHealth.circuitState) { + if (!existingHealth + || redisState.circuitState !== existingHealth.circuitState + || redisState.failureCount !== existingHealth.failureCount) {如果当前行为是有意的(减少不必要的对象分配),可以忽略此建议。
tests/unit/actions/provider-endpoints.test.ts (1)
13-14: 批量仓库 mock 已声明但未在任何测试中断言使用。
findProviderEndpointProbeLogsBatchMock和findVendorTypeEndpointStatsBatchMock虽然作为模块 mock 注册是合理的(避免导入解析失败),但当前无任何测试用例验证其调用行为。如果这些批量仓库函数已在 action 层被使用,建议补充相应的测试覆盖。#!/bin/bash # 验证 findProviderEndpointProbeLogsBatch 和 findVendorTypeEndpointStatsBatch 在 action 层是否有实际调用 rg -n --type=ts 'findProviderEndpointProbeLogsBatch|findVendorTypeEndpointStatsBatch' -g '!**/test*' -g '!**/*.test.*'Also applies to: 57-60
src/app/[locale]/settings/providers/_components/provider-endpoints-table.tsx (1)
141-158: 批量分片获取熔断状态逻辑正确,建议提取分片工具函数。500 条为单位的分片 +
Promise.all并行拉取模式在本文件和endpoint-latency-sparkline.tsx中均有出现(相同的MAX_ENDPOINT_IDS_PER_BATCH和 chunking 循环)。可以考虑提取为通用工具函数以减少重复。♻️ 可选:提取通用分片工具
// e.g. in `@/lib/utils/chunk.ts` export function chunkArray<T>(arr: T[], size: number): T[][] { const chunks: T[][] = []; for (let i = 0; i < arr.length; i += size) { chunks.push(arr.slice(i, i + size)); } return chunks; }然后在两处分别引用:
- const MAX_ENDPOINT_IDS_PER_BATCH = 500; - const chunks: number[][] = []; - for (let index = 0; index < endpointIds.length; index += MAX_ENDPOINT_IDS_PER_BATCH) { - chunks.push(endpointIds.slice(index, index + MAX_ENDPOINT_IDS_PER_BATCH)); - } + const chunks = chunkArray(endpointIds, 500);src/app/[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx (4)
48-91:normalizeProbeLogsByEndpointId中对logs数组元素未做运行时校验。Line 58、72、85 处均将
logs直接断言为ProbeLog[],但未验证数组内部元素是否确实包含ok和latencyMs字段。如果后端返回了非预期结构,下游.map()会产生undefined值而非报错,导致 sparkline 渲染异常但不会抛出明显错误(静默失败)。考虑到这是性能优化代码且后端格式相对可控,这属于防御性编程建议,可以后续再做。
93-142:isBatchProbeLogsEndpointAvailable状态一旦为false则整个页面生命周期内永不恢复。当遇到 404 后,该模块级变量被置为
false,后续所有批量请求均跳过,直到用户刷新页面。这在前后端版本切换场景下是合理的降级策略(PR 描述中也有说明),但如果用户长时间不刷新页面而后端已升级,将持续使用低效的逐条请求模式。如果需要改进,可以考虑加一个时间窗口(如 5 分钟后重试一次 batch):
♻️ 可选:添加重试时间窗口
-let isBatchProbeLogsEndpointAvailable: boolean | undefined; +let isBatchProbeLogsEndpointAvailable: boolean | undefined; +let batchProbeLogsDisabledAt: number | undefined; +const BATCH_RETRY_INTERVAL_MS = 5 * 60 * 1000; async function tryFetchBatchProbeLogsByEndpointIds( endpointIds: number[], limit: number ): Promise<Record<number, ProbeLog[]> | null> { if (endpointIds.length <= 1) return null; - if (isBatchProbeLogsEndpointAvailable === false) return null; + if (isBatchProbeLogsEndpointAvailable === false) { + if (batchProbeLogsDisabledAt && Date.now() - batchProbeLogsDisabledAt < BATCH_RETRY_INTERVAL_MS) { + return null; + } + isBatchProbeLogsEndpointAvailable = undefined; + } if (process.env.NODE_ENV === "test") return null;在 404 分支也记录时间:
if (res.status === 404) { isBatchProbeLogsEndpointAvailable = false; + batchProbeLogsDisabledAt = Date.now(); return null; }
112-135: 批量分片采用串行处理,与provider-endpoints-table.tsx中的并行策略不一致。此处
for (const chunk of chunks)是串行的,而provider-endpoints-table.tsx中熔断状态的分片使用Promise.all(chunks.map(...))并行。对于 probe logs 数据量较大的场景,串行可能是有意为之(减少服务端压力),但值得在注释中说明理由,避免后续维护者困惑。此外,Line 130 处如果某个中间 chunk 的
normalized返回null,前面已成功处理的 chunk 结果会被丢弃,整体降级为逐条请求(这些端点会被重复拉取)。影响不大但值得注意。
183-227:ProbeLogsBatcher微批量合并设计合理,有一个小的防御性建议。10ms 的
setTimeout窗口 + 按 limit 分组 + snapshot-then-clear 的刷新策略都很好。不过 Line 199 的void this.flush()如果flush()中Promise.all外层发生非预期异常(虽然极不可能),会导致 unhandled promise rejection。可考虑加一个.catch:♻️ 可选:添加顶层 catch 防护
- this.flushTimer = setTimeout(() => { - this.flushTimer = null; - void this.flush(); - }, delayMs); + this.flushTimer = setTimeout(() => { + this.flushTimer = null; + this.flush().catch(() => {}); + }, delayMs);src/repository/provider.ts (3)
870-876: 考虑导出ProviderStatisticsRow类型
getProviderStatistics是导出函数,但其返回类型ProviderStatisticsRow未导出。调用方无法按名称引用该类型,只能通过ReturnType<>等方式推断,不利于类型复用。建议修改
-type ProviderStatisticsRow = { +export type ProviderStatisticsRow = { id: number; today_cost: string; today_calls: number; last_call_time: Date | null; last_call_model: string | null; };
886-898: 缓存对并发请求无去重(thundering herd)当缓存过期瞬间,若多个并发请求同时到达
getProviderStatistics,它们都会通过缓存检查(expiresAt > now为 false),各自发起独立的 DB 查询。虽然 10 秒 TTL 下概率较低,但在高频轮询场景下仍可能导致短暂的查询峰值。如果未来观察到此类峰值,可考虑使用
Promise级别的去重(即将 in-flight 查询的 Promise 缓存,后续请求复用同一 Promise),而非仅缓存结果数据。当前实现在现有规模下应该是可接受的。
903-908:boundsCTE 中AT TIME ZONE转换逻辑正确,但重复三次可简化
DATE_TRUNC('day', CURRENT_TIMESTAMP AT TIME ZONE ${timezone}) AT TIME ZONE ${timezone}模式重复了三次(today_start、tomorrow_start、last7_start)。可以先定义today_start,然后用算术表达式派生其余两个,减少重复:简化建议
WITH bounds AS ( SELECT (DATE_TRUNC('day', CURRENT_TIMESTAMP AT TIME ZONE ${timezone}) AT TIME ZONE ${timezone}) AS today_start, - (DATE_TRUNC('day', CURRENT_TIMESTAMP AT TIME ZONE ${timezone}) AT TIME ZONE ${timezone}) + INTERVAL '1 day' AS tomorrow_start, - (DATE_TRUNC('day', CURRENT_TIMESTAMP AT TIME ZONE ${timezone}) AT TIME ZONE ${timezone}) - INTERVAL '7 days' AS last7_start + (DATE_TRUNC('day', CURRENT_TIMESTAMP AT TIME ZONE ${timezone}) AT TIME ZONE ${timezone}) + INTERVAL '1 day' AS tomorrow_start + ), + bounds_ext AS ( + SELECT *, today_start - INTERVAL '7 days' AS last7_start FROM bounds ),或者保持单 CTE,利用 PostgreSQL 的列引用能力(需注意 CTE 中不能直接引用同级列)。当前写法在功能上没有问题,仅是可读性建议。
There was a problem hiding this comment.
Code Review Summary
This is a well-structured performance optimization PR that addresses N+1 query patterns and request storms in the Providers management UI. The changes introduce batch APIs, micro-batching for probe logs, and lazy loading via intersection observers.
PR Size: L
- Lines changed: 1,840 (1,556 additions, 284 deletions)
- Files changed: 23
Note: As a Large PR, consider whether this could be split into smaller, more focused changes for easier review:
- Backend batch APIs (repository + actions)
- Frontend batching/deferred loading hooks
- UI component updates
Issues Found
| Category | Critical | High | Medium | Low |
|---|---|---|---|---|
| Logic/Bugs | 0 | 0 | 0 | 0 |
| Security | 0 | 0 | 0 | 0 |
| Error Handling | 0 | 0 | 0 | 0 |
| Types | 0 | 0 | 0 | 0 |
| Comments/Docs | 0 | 0 | 0 | 0 |
| Tests | 0 | 0 | 0 | 0 |
| Simplification | 0 | 0 | 0 | 0 |
Review Coverage
- Logic and correctness - Clean
- Security (OWASP Top 10) - Clean
- Error handling - Clean (graceful fallbacks are intentional)
- Type safety - Clean
- Documentation accuracy - Clean
- Test coverage - Adequate (tests updated for new batch functions)
- Code clarity - Good
Notable Design Decisions
-
Micro-batching pattern (
ProbeLogsBatcher,requestVendorTypeEndpointStatsBatched): Uses module-level state with setTimeout debouncing to coalesce rapid requests. This is a valid pattern for reducing request storms. -
Concurrent worker pattern in
fetchProbeLogsByEndpointIds: Theidxincrement is synchronous before any await, making it safe in JavaScript's single-threaded execution model. -
Graceful degradation: Empty catch blocks in fallback paths intentionally suppress errors to allow fallback to individual API calls when batch endpoints are unavailable (e.g., during rolling deployments).
-
In-memory caches: TTL-based caches in
statistics.tsandusage-logs.tsreduce DB load for frequently accessed data. Cache invalidation is handled via TTL expiration.
Automated review by Claude AI
| const delayMs = process.env.NODE_ENV === "test" ? 0 : 10; | ||
| this.flushTimer = setTimeout(() => { | ||
| this.flushTimer = null; | ||
| void this.flush().catch(() => {}); | ||
| }, delayMs); |
There was a problem hiding this comment.
10ms micro-batch window effectively coalesces rapid mount events (scrolling through list) with no perceptible user delay
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx
Line: 266:270
Comment:
10ms micro-batch window effectively coalesces rapid mount events (scrolling through list) with no perceptible user delay
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| let errorCount = 0; | ||
| const errorSamples: Array<{ endpointId: number; error: string }> = []; | ||
|
|
||
| for (let index = 0; index < batchIds.length; index++) { | ||
| const endpointId = batchIds[index]; | ||
| const [error, data] = results[index] ?? []; | ||
|
|
||
| if (error) { | ||
| errorCount += 1; | ||
| if (errorSamples.length < 3) { | ||
| errorSamples.push({ | ||
| endpointId, | ||
| error: error instanceof Error ? error.message : String(error), | ||
| }); | ||
| } | ||
| continue; |
There was a problem hiding this comment.
well-designed error tolerance pattern - samples first 3 errors to avoid log spam while batch failures don't abort entire operation
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/redis/endpoint-circuit-breaker-state.ts
Line: 90:105
Comment:
well-designed error tolerance pattern - samples first 3 errors to avoid log spam while batch failures don't abort entire operation
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| // Defensive: ensure per-endpoint limit, even if SQL changes or driver behavior differs. | ||
| for (const [endpointId, logs] of map) { | ||
| if (logs.length > limitPerEndpoint) { | ||
| map.set(endpointId, logs.slice(0, limitPerEndpoint)); | ||
| } | ||
| } |
There was a problem hiding this comment.
defensive slicing ensures per-endpoint limit even if SQL windowing behavior differs - good safety net
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider-endpoints-batch.ts
Line: 148:153
Comment:
defensive slicing ensures per-endpoint limit even if SQL windowing behavior differs - good safety net
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In
`@src/app/`[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx:
- Around line 48-91: normalizeProbeLogsByEndpointId 未验证数组内元素结构就把数组断言为
ProbeLog[](在 Array.isArray(data) 分支、logsByEndpointId 分支和 items
分支),会把畸形数据传到下游。请在这三个分支里对 logs 数组做最小结构校验:确保数组且至少检查第一个或每个元素包含必须字段(如 ok,
latencyMs/timestamp 等),用 filter 过滤掉不符合结构的条目并只在通过校验后才赋值到
map(或将无效/缺失字段的条目替换为带默认值的安全对象),这样 downstream 的 SparkPoint 映射就不会收到 undefined 字段。
In `@src/lib/hooks/use-in-view-once.ts`:
- Around line 77-97: The effect in use-in-view-once.ts can return early when
ref.current is null (delayed mount) and never re-run because dependencies omit
the element; update the logic so the effect depends on the actual element (or
use a callback ref) to ensure the observer is created when the element mounts:
either include ref.current (or a local state like observedEl set by a ref
callback) in the effect dependency array and use that element for
observer.observe, or refactor the hook to expose/accept a callback ref that
assigns the element to state and triggers the IntersectionObserver creation;
make sure to still guard for test/IntersectionObserver absence, call
setIsInView(true) when appropriate, and disconnect the observer in the cleanup.
🧹 Nitpick comments (7)
src/repository/provider.ts (2)
988-992: 缓存expiresAt使用了查询前捕获的now
now在第 897 行捕获(查询执行之前),但在第 990 行用于计算缓存过期时间。如果查询耗时较长(例如慢查询场景),缓存的有效 TTL 会被缩短。建议在写入缓存时使用新的时间戳:建议修改
providerStatisticsCache = { timezone, - expiresAt: now + PROVIDER_STATISTICS_CACHE_TTL_MS, + expiresAt: Date.now() + PROVIDER_STATISTICS_CACHE_TTL_MS, data, };
920-962:final_provider_id的 CASE 表达式在两个 CTE 中重复
provider_stats和latest_call中的CASE WHEN provider_chain IS NULL OR provider_chain = '[]'::jsonb THEN provider_id ELSE (provider_chain->-1->>'id')::int END完全相同。可考虑提取为一个公共 CTE(如resolved_requests),在其中计算final_provider_id后供后续 CTE 引用,减少维护时两处逻辑不一致的风险。src/app/[locale]/settings/providers/_components/vendor-keys-compact-list.tsx (1)
121-127: 查询失效模式合理,可考虑抽取复用。新增的
providers-health和providers-statistics失效逻辑在 create / delete / edit / toggle 四处重复出现(toggle 合理地省略了 statistics)。当前规模可接受,但若后续再增加 query key,维护成本会上升。可考虑抽取一个invalidateProviderQueries(scope)辅助函数统一管理。示例:抽取失效辅助函数
// 在组件文件顶部或独立 util 中 function invalidateProviderQueries( queryClient: ReturnType<typeof useQueryClient>, scope: "full" | "toggle" = "full", ) { queryClient.invalidateQueries({ queryKey: ["providers"] }); queryClient.invalidateQueries({ queryKey: ["providers-health"] }); queryClient.invalidateQueries({ queryKey: ["provider-vendors"] }); if (scope === "full") { queryClient.invalidateQueries({ queryKey: ["providers-statistics"] }); } }src/lib/hooks/use-in-view-once.ts (1)
53-64: 在渲染期间直接读写 ref —— React 19 严格模式下存在隐患。
useStableIntersectionObserverOptions在渲染阶段读写stableOptionsRef.current(第 59-60 行)。React 19 严格模式下要求渲染函数为纯函数,渲染期间的 ref 读写虽然在实践中通常可行,但已被 React 官方文档标记为不推荐。如果 Strict Mode 的 double-render 导致比较在两次渲染间产生不同结果,可能出现细微问题。可选方案:通过
useRef+useEffect组合延迟更新,或使用useMemo配合序列化 key 来保持稳定引用。当前实际使用场景(options 几乎不变)下风险极低,标记为可选改进。src/app/[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx (3)
93-93: 模块级可变状态isBatchProbeLogsEndpointAvailable不可恢复。404 触发后
isBatchProbeLogsEndpointAvailable被永久置为false(直到页面刷新)。如果后端临时返回 404(部署间隙、蓝绿切换等),用户需要刷新页面才能恢复批量路径。可考虑加一个带 TTL 的重置机制,比如 5 分钟后重新尝试批量端点。添加 TTL 重置的示例
-let isBatchProbeLogsEndpointAvailable: boolean | undefined; +let isBatchProbeLogsEndpointAvailable: boolean | undefined; +let batchDisabledAt: number | undefined; +const BATCH_RETRY_INTERVAL_MS = 5 * 60 * 1000; + +function isBatchDisabled(): boolean { + if (isBatchProbeLogsEndpointAvailable === false) { + if (batchDisabledAt && Date.now() - batchDisabledAt > BATCH_RETRY_INTERVAL_MS) { + isBatchProbeLogsEndpointAvailable = undefined; + batchDisabledAt = undefined; + return false; + } + return true; + } + return false; +}然后在 404 处理中记录时间:
if (res.status === 404) { isBatchProbeLogsEndpointAvailable = false; + batchDisabledAt = Date.now();
95-206: 约 250 行的批量数据获取逻辑应考虑抽取到独立模块。
tryFetchBatchProbeLogsByEndpointIds、fetchProbeLogsByEndpointIdsIndividually、fetchProbeLogsByEndpointIds、normalizeProbeLogsByEndpointId以及ProbeLogsBatcher类构成了完整的数据获取层,与 UI 组件(sparkline 渲染)职责不同。将其抽取到如@/lib/probe-logs-batcher.ts有助于:
- 独立测试批量/降级/合并逻辑
- 在其他组件中复用 batcher
- 降低当前文件的认知负荷
Also applies to: 208-236, 238-245, 247-307
216-232: 并发 worker 模式正确但依赖 JS 单线程语义——值得加注释说明。
idx变量在多个 async worker 间共享,读取和自增发生在同一同步代码段(await之前),因此在 JS 事件循环模型下是安全的。但这一模式不够直观,建议加一行注释说明其安全性依赖。
src/app/[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx
Show resolved
Hide resolved
|
跟进 CodeRabbit review 做了一轮小幅补齐(不改变现有 UX),主要是把一些 corner case 和一致性问题收敛掉:
本地已验证: |
|
跟进 CodeRabbit 最新两条 actionable,并顺手修正一个小的缓存 TTL 细节(不改变现有 UX/接口语义):
本地与 CI 均已通过(build/lint/typecheck/test 全绿),CodeRabbit 当前已回到 Approved。 |
|
补齐一轮 #779 相关的“热路径性能 + 错误性质 bug 收敛 + 安全收敛”(尽量不改变现有 UX/接口语义):
本地已验证: |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/app/[locale]/settings/providers/_components/forms/provider-form/index.tsx (1)
370-374:⚠️ Potential issue | 🟠 Major创建成功仅失效
provider-vendors缓存符合 PR 目标,但编辑路径存在缺陷:编辑操作无形式级别的缓存失效,完全依赖可选的onSuccess回调,若回调执行失败(第 378-380 行仅记录错误)将导致缓存得不到刷新。建议改进:要么为编辑路径添加与创建路径一致的形式级别失效(如
queryClient.invalidateQueries({ queryKey: ["providers"] })),要么将onSuccess改为必需参数,确保缓存刷新的可靠性。tests/unit/actions/total-usage-semantics.test.ts (1)
48-52:⚠️ Potential issue | 🟡 Minor缺少
SessionTracker.getUserSessionCount的 mock
getMyQuota现在调用getUserConcurrentSessions(userId)→SessionTracker.getUserSessionCount(userId),但此处 SessionTracker mock 仅提供getKeySessionCount,没有getUserSessionCount。虽然getUserConcurrentSessions内部有 try-catch 兜底返回 0,测试不会报错,但这属于"静默吞错"而非正确的 mock 行为。🛡️ 建议补充 mock
vi.mock("@/lib/session-tracker", () => ({ SessionTracker: { getKeySessionCount: (...args: unknown[]) => getKeySessionCountMock(...args), + getUserSessionCount: vi.fn().mockResolvedValue(0), }, }));
🤖 Fix all issues with AI agents
In `@src/actions/provider-endpoints.ts`:
- Around line 567-583: The code calls resetEndpointCircuitState again after a
successful probe, creating redundant Redis round-trips because
probeProviderEndpointAndRecordByEndpoint already calls resetEndpointCircuit when
appropriate; remove the entire conditional block that checks result.ok and tries
to call resetEndpointCircuitState (the try/catch logging calling
resetEndpointCircuitState with endpoint.id) from
src/actions/provider-endpoints.ts so the single reset path inside
probeProviderEndpointAndRecordByEndpoint (which invokes resetEndpointCircuit) is
the only place handling circuit resets.
In `@src/repository/statistics.ts`:
- Around line 984-1019: The outer WHERE uses scanEnd as an upper bound which can
truncate the costTotal aggregate (filtered only by createdAt >= cutoffDate); fix
by computing scanEnd to include the current time so it cannot be earlier than
now — i.e. set scanEnd = new Date(Math.max(..., Date.now())) when building
scanEnd (the same pattern should be applied in the analogous
sumKeyQuotaCostsById logic); update uses of scanEnd (and any other range
end-time max calculations) so costTotal (and related totals) are not
artificially upper-bounded by stale range endTimes.
🧹 Nitpick comments (8)
src/repository/usage-logs.ts (3)
380-417: COUNT 和数据查询可以并行执行
findUsageLogsForKeySlim中 count 查询(line 380)和分页数据查询(line 388)是顺序执行的,但两者相互独立,可以用Promise.all并行化以减少延迟。♻️ 建议使用 Promise.all 并行查询
- const [countResult] = await db - .select({ totalRows: sql<number>`count(*)::double precision` }) - .from(messageRequest) - .where(and(...conditions)); - - const total = countResult?.totalRows ?? 0; - - const offset = (safePage - 1) * safePageSize; - const results = await db - .select({ - id: messageRequest.id, - ... - }) - .from(messageRequest) - .where(and(...conditions)) - .orderBy(desc(messageRequest.createdAt), desc(messageRequest.id)) - .limit(safePageSize) - .offset(offset); + const offset = (safePage - 1) * safePageSize; + const [countResult, results] = await Promise.all([ + db + .select({ totalRows: sql<number>`count(*)::double precision` }) + .from(messageRequest) + .where(and(...conditions)), + db + .select({ + id: messageRequest.id, + // ... 其余字段 + }) + .from(messageRequest) + .where(and(...conditions)) + .orderBy(desc(messageRequest.createdAt), desc(messageRequest.id)) + .limit(safePageSize) + .offset(offset), + ]); + + const total = countResult?.[0]?.totalRows ?? 0;
420-444: 缓存写入时expiresAt使用调用方传入的now而非当前时间根据 PR 评论中的讨论,写入缓存时
expiresAt应使用Date.now()而非调用方在查询前捕获的now,以避免慢查询场景下缓存有效期被缩短。当前实现中setDistinctKeyOptionsCache直接使用调用方传入的now(在 DB 查询之前捕获),可能导致有效 TTL 减少。♻️ 建议在 setDistinctKeyOptionsCache 内部使用 Date.now()
function setDistinctKeyOptionsCache( cache: Map<string, { data: string[]; expiresAt: number }>, key: string, data: string[], - now: number ): void { + const now = Date.now(); if (cache.size >= DISTINCT_KEY_OPTIONS_CACHE_MAX_SIZE) { for (const [k, v] of cache) { if (v.expiresAt <= now) { cache.delete(k); } } if (cache.size >= DISTINCT_KEY_OPTIONS_CACHE_MAX_SIZE) { cache.clear(); } } cache.set(key, { data, expiresAt: now + DISTINCT_KEY_OPTIONS_CACHE_TTL_MS }); }
284-301:UsageLogSlimFilters和UsageLogSlimRow未导出
findUsageLogsForKeySlim是导出函数,但其参数类型UsageLogSlimFilters和返回值中的UsageLogSlimRow没有导出。虽然 TypeScript 结构化类型允许调用方传入字面量对象,但在外部代码需要引用这些类型时(如在测试 mock 中构造类型安全的参数)会受限。AI 摘要也提到这些是新增的公共类型。♻️ 建议导出这两个接口
-interface UsageLogSlimFilters { +export interface UsageLogSlimFilters { keyString: string; ... } -interface UsageLogSlimRow { +export interface UsageLogSlimRow { id: number; ... }Also applies to: 303-319
src/actions/my-usage.ts (1)
600-648:keyBreakdown与userBreakdown查询字段不一致
keyBreakdown查询包含cacheCreation5mTokens和cacheCreation1hTokens(line 611-612),但userBreakdown查询没有这两个字段(line 627-636)。虽然这两个字段仅用于 summary 级别的统计(从 keyBreakdown reduce 而来)、不影响userModelBreakdown的映射逻辑,但两个并行查询的 select 列表不一致容易引起混淆,且如果后续需要在 userModelBreakdown 中也展示这些字段则需要再改。src/lib/provider-endpoints/endpoint-selector.ts (2)
41-45:findEnabledProviderEndpointsByVendorAndType已在 DB 层过滤isEnabled和deletedAt,应用层过滤冗余
findEnabledProviderEndpointsByVendorAndType的查询条件已包含is_enabled = true AND deleted_at IS NULL(见src/repository/provider-endpoints.tsline 823-828),line 45 的e.isEnabled && !e.deletedAt检查重复。仅excludeSet.has(e.id)过滤是必要的。♻️ 简化过滤逻辑
- const filtered = endpoints.filter((e) => e.isEnabled && !e.deletedAt && !excludeSet.has(e.id)); + const filtered = excludeSet.size > 0 + ? endpoints.filter((e) => !excludeSet.has(e.id)) + : endpoints;
81-98:getEndpointFilterStats中 enabled 端点的过滤执行了两次Line 83 和 line 90 对同一数组执行了相同的
e.isEnabled && !e.deletedAt过滤。可以复用一次过滤的结果。♻️ 消除重复过滤
const endpoints = await findProviderEndpointsByVendorAndType(input.vendorId, input.providerType); const total = endpoints.length; - const enabled = endpoints.filter((e) => e.isEnabled && !e.deletedAt).length; + const enabledEndpoints = endpoints.filter((e) => e.isEnabled && !e.deletedAt); + const enabled = enabledEndpoints.length; // When endpoint circuit breaker is disabled, no endpoints can be circuit-open if (!getEnvConfig().ENABLE_ENDPOINT_CIRCUIT_BREAKER) { return { total, enabled, circuitOpen: 0, available: enabled }; } - const enabledEndpoints = endpoints.filter((e) => e.isEnabled && !e.deletedAt); if (enabledEndpoints.length === 0) { return { total, enabled: 0, circuitOpen: 0, available: 0 }; }src/repository/provider-endpoints.ts (1)
793-833: 建议复用providerEndpointSelectFields常量减少重复。
findEnabledProviderEndpointsByVendorAndType的.select(...)字段与findProviderEndpointsByVendorAndType(第 762-779 行)以及第 209 行定义的providerEndpointSelectFields完全相同。直接复用该常量可减少维护负担。♻️ 建议的改动
const rows = await db - .select({ - id: providerEndpoints.id, - vendorId: providerEndpoints.vendorId, - providerType: providerEndpoints.providerType, - url: providerEndpoints.url, - label: providerEndpoints.label, - sortOrder: providerEndpoints.sortOrder, - isEnabled: providerEndpoints.isEnabled, - lastProbedAt: providerEndpoints.lastProbedAt, - lastProbeOk: providerEndpoints.lastProbeOk, - lastProbeStatusCode: providerEndpoints.lastProbeStatusCode, - lastProbeLatencyMs: providerEndpoints.lastProbeLatencyMs, - lastProbeErrorType: providerEndpoints.lastProbeErrorType, - lastProbeErrorMessage: providerEndpoints.lastProbeErrorMessage, - createdAt: providerEndpoints.createdAt, - updatedAt: providerEndpoints.updatedAt, - deletedAt: providerEndpoints.deletedAt, - }) + .select(providerEndpointSelectFields) .from(providerEndpoints)tests/unit/lib/provider-endpoints/endpoint-selector.test.ts (1)
108-131: 健康状态 mock 类型定义在多个测试块中重复。
getAllStatusMock的返回值类型(包含failureCount,circuitState等字段的 Record)在第 108-131、206-229、291-314 行重复了三次。可以考虑将类型提取为文件级别的 type alias 或抽取一个 helper 函数来构造 mock,减少重复。不过对测试代码来说可读性优先,这只是一个可选优化。
| const shouldResetCircuit = | ||
| parsed.data.url !== undefined || | ||
| (parsed.data.isEnabled === true && previous.isEnabled === false); |
There was a problem hiding this comment.
Circuit reset triggers on unchanged URL
shouldResetCircuit checks parsed.data.url !== undefined, but the edit form (EditEndpointDialog) always sends the url field — even when only label or sortOrder was changed. This means every edit will reset the circuit breaker, regardless of what actually changed.
Since this is best-effort and has no functional impact beyond an extra Redis write, this is low severity. However, comparing against the previous value would be more precise:
| const shouldResetCircuit = | |
| parsed.data.url !== undefined || | |
| (parsed.data.isEnabled === true && previous.isEnabled === false); | |
| const shouldResetCircuit = | |
| (parsed.data.url !== undefined && parsed.data.url !== previous.url) || | |
| (parsed.data.isEnabled === true && previous.isEnabled === false); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/actions/provider-endpoints.ts
Line: 407:409
Comment:
**Circuit reset triggers on unchanged URL**
`shouldResetCircuit` checks `parsed.data.url !== undefined`, but the edit form (`EditEndpointDialog`) always sends the `url` field — even when only `label` or `sortOrder` was changed. This means every edit will reset the circuit breaker, regardless of what actually changed.
Since this is best-effort and has no functional impact beyond an extra Redis write, this is low severity. However, comparing against the previous value would be more precise:
```suggestion
const shouldResetCircuit =
(parsed.data.url !== undefined && parsed.data.url !== previous.url) ||
(parsed.data.isEnabled === true && previous.isEnabled === false);
```
How can I resolve this? If you propose a fix, please make it concise.
src/repository/usage-logs.ts
Outdated
| const safePage = page > 0 ? page : 1; | ||
| const safePageSize = Math.min(100, Math.max(1, pageSize)); | ||
|
|
||
| const conditions = [isNull(messageRequest.deletedAt), eq(messageRequest.key, keyString)]; |
There was a problem hiding this comment.
Missing EXCLUDE_WARMUP_CONDITION in slim query
findUsageLogsForKeySlim does not include EXCLUDE_WARMUP_CONDITION in its conditions array, unlike the original findUsageLogsWithDetails which applies it. This means warmup requests will now be counted in usage logs results for users, potentially showing inflated request counts and costs.
For comparison, findUsageLogsStats at line 885 explicitly adds EXCLUDE_WARMUP_CONDITION to statsConditions, and the old findUsageLogsWithDetails also applies it.
| const conditions = [isNull(messageRequest.deletedAt), eq(messageRequest.key, keyString)]; | |
| const conditions = [isNull(messageRequest.deletedAt), eq(messageRequest.key, keyString), EXCLUDE_WARMUP_CONDITION]; |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/usage-logs.ts
Line: 341:341
Comment:
**Missing `EXCLUDE_WARMUP_CONDITION` in slim query**
`findUsageLogsForKeySlim` does not include `EXCLUDE_WARMUP_CONDITION` in its `conditions` array, unlike the original `findUsageLogsWithDetails` which applies it. This means warmup requests will now be counted in usage logs results for users, potentially showing inflated request counts and costs.
For comparison, `findUsageLogsStats` at line 885 explicitly adds `EXCLUDE_WARMUP_CONDITION` to `statsConditions`, and the old `findUsageLogsWithDetails` also applies it.
```suggestion
const conditions = [isNull(messageRequest.deletedAt), eq(messageRequest.key, keyString), EXCLUDE_WARMUP_CONDITION];
```
How can I resolve this? If you propose a fix, please make it concise.| const query = sql` | ||
| SELECT | ||
| id, | ||
| endpoint_id as "endpointId", | ||
| source, | ||
| ok, | ||
| status_code as "statusCode", | ||
| latency_ms as "latencyMs", | ||
| error_type as "errorType", | ||
| error_message as "errorMessage", | ||
| created_at as "createdAt" | ||
| FROM ( | ||
| SELECT | ||
| id, | ||
| endpoint_id, | ||
| source, | ||
| ok, | ||
| status_code, | ||
| latency_ms, | ||
| error_type, | ||
| error_message, | ||
| created_at, | ||
| ROW_NUMBER() OVER (PARTITION BY endpoint_id ORDER BY created_at DESC) AS row_num | ||
| FROM provider_endpoint_probe_logs | ||
| WHERE endpoint_id IN (${idList}) | ||
| ) ranked | ||
| WHERE ranked.row_num <= ${limitPerEndpoint} | ||
| ORDER BY ranked.endpoint_id ASC, ranked.created_at DESC | ||
| `; |
There was a problem hiding this comment.
SQL injection via unsanitized endpoint IDs
The endpointIds are interpolated into raw SQL via sql.join without parameterization. While these are numbers validated by the Zod schema at the action layer, this raw SQL construction pattern bypasses Drizzle's built-in parameterization. If this function were ever called from a path without prior validation, it would be vulnerable.
Consider using Drizzle's inArray operator (as done in findVendorTypeEndpointStatsBatch above) or ensure the values are parameterized:
| const query = sql` | |
| SELECT | |
| id, | |
| endpoint_id as "endpointId", | |
| source, | |
| ok, | |
| status_code as "statusCode", | |
| latency_ms as "latencyMs", | |
| error_type as "errorType", | |
| error_message as "errorMessage", | |
| created_at as "createdAt" | |
| FROM ( | |
| SELECT | |
| id, | |
| endpoint_id, | |
| source, | |
| ok, | |
| status_code, | |
| latency_ms, | |
| error_type, | |
| error_message, | |
| created_at, | |
| ROW_NUMBER() OVER (PARTITION BY endpoint_id ORDER BY created_at DESC) AS row_num | |
| FROM provider_endpoint_probe_logs | |
| WHERE endpoint_id IN (${idList}) | |
| ) ranked | |
| WHERE ranked.row_num <= ${limitPerEndpoint} | |
| ORDER BY ranked.endpoint_id ASC, ranked.created_at DESC | |
| `; | |
| const query = sql` | |
| SELECT | |
| id, | |
| endpoint_id as "endpointId", | |
| source, | |
| ok, | |
| status_code as "statusCode", | |
| latency_ms as "latencyMs", | |
| error_type as "errorType", | |
| error_message as "errorMessage", | |
| created_at as "createdAt" | |
| FROM ( | |
| SELECT | |
| id, | |
| endpoint_id, | |
| source, | |
| ok, | |
| status_code, | |
| latency_ms, | |
| error_type, | |
| error_message, | |
| created_at, | |
| ROW_NUMBER() OVER (PARTITION BY endpoint_id ORDER BY created_at DESC) AS row_num | |
| FROM provider_endpoint_probe_logs | |
| WHERE endpoint_id = ANY(${endpointIds}) | |
| ) ranked | |
| WHERE ranked.row_num <= ${limitPerEndpoint} | |
| ORDER BY ranked.endpoint_id ASC, ranked.created_at DESC | |
| `; |
Using = ANY($1) with a proper array parameter is both safer and cleaner than building an inline IN (...) list.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider-endpoints-batch.ts
Line: 83:111
Comment:
**SQL injection via unsanitized endpoint IDs**
The `endpointIds` are interpolated into raw SQL via `sql.join` without parameterization. While these are numbers validated by the Zod schema at the action layer, this raw SQL construction pattern bypasses Drizzle's built-in parameterization. If this function were ever called from a path without prior validation, it would be vulnerable.
Consider using Drizzle's `inArray` operator (as done in `findVendorTypeEndpointStatsBatch` above) or ensure the values are parameterized:
```suggestion
const query = sql`
SELECT
id,
endpoint_id as "endpointId",
source,
ok,
status_code as "statusCode",
latency_ms as "latencyMs",
error_type as "errorType",
error_message as "errorMessage",
created_at as "createdAt"
FROM (
SELECT
id,
endpoint_id,
source,
ok,
status_code,
latency_ms,
error_type,
error_message,
created_at,
ROW_NUMBER() OVER (PARTITION BY endpoint_id ORDER BY created_at DESC) AS row_num
FROM provider_endpoint_probe_logs
WHERE endpoint_id = ANY(${endpointIds})
) ranked
WHERE ranked.row_num <= ${limitPerEndpoint}
ORDER BY ranked.endpoint_id ASC, ranked.created_at DESC
`;
```
Using `= ANY($1)` with a proper array parameter is both safer and cleaner than building an inline `IN (...)` list.
How can I resolve this? If you propose a fix, please make it concise.| @@ -482,9 +468,7 @@ export async function getMyUsageLogs( | |||
| minRetryCount: filters.minRetryCount, | |||
| page, | |||
| pageSize, | |||
| }; | |||
|
|
|||
| const result = await findUsageLogsWithDetails(usageFilters); | |||
| }); | |||
There was a problem hiding this comment.
Missing sessionId filter in slim query call
The old code passed all filters from UsageLogFilters to findUsageLogsWithDetails, but this new call to findUsageLogsForKeySlim does not pass filters.sessionId. The UsageLogSlimFilters interface does accept sessionId, so this looks like an accidental omission. Users filtering by session ID in the usage logs UI will get unfiltered results.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/actions/my-usage.ts
Line: 460:471
Comment:
**Missing `sessionId` filter in slim query call**
The old code passed all filters from `UsageLogFilters` to `findUsageLogsWithDetails`, but this new call to `findUsageLogsForKeySlim` does not pass `filters.sessionId`. The `UsageLogSlimFilters` interface does accept `sessionId`, so this looks like an accidental omission. Users filtering by session ID in the usage logs UI will get unfiltered results.
How can I resolve this? If you propose a fix, please make it concise.
src/repository/provider.ts
Outdated
| ELSE (provider_chain->-1->>'id')::int | ||
| END AS final_provider_id, | ||
| cost_usd | ||
| FROM message_request, bounds |
There was a problem hiding this comment.
Implicit cross join with bounds CTE
The FROM message_request, bounds syntax is a cross join. Since bounds returns exactly one row, this is functionally correct and equivalent to using CROSS JOIN bounds. However, it relies on the CTE always returning exactly one row — if bounds ever returned zero rows (e.g., due to a CURRENT_TIMESTAMP edge case in some PostgreSQL configurations), the entire subquery would return no rows silently.
This is safe in practice since bounds uses deterministic DATE_TRUNC(CURRENT_TIMESTAMP) expressions, but an explicit CROSS JOIN would make the intent clearer:
FROM message_request CROSS JOIN bounds
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider.ts
Line: 933:933
Comment:
**Implicit cross join with `bounds` CTE**
The `FROM message_request, bounds` syntax is a cross join. Since `bounds` returns exactly one row, this is functionally correct and equivalent to using `CROSS JOIN bounds`. However, it relies on the CTE always returning exactly one row — if `bounds` ever returned zero rows (e.g., due to a `CURRENT_TIMESTAMP` edge case in some PostgreSQL configurations), the entire subquery would return no rows silently.
This is safe in practice since `bounds` uses deterministic `DATE_TRUNC(CURRENT_TIMESTAMP)` expressions, but an explicit `CROSS JOIN` would make the intent clearer:
```
FROM message_request CROSS JOIN bounds
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| unhealthy: 0, | ||
| unknown: 0, | ||
| }); | ||
| } |
There was a problem hiding this comment.
Module-level mutable state shared across components
pendingVendorIdsByProviderType, deferredByProviderTypeVendorId, and vendorStatsFlushTimer are module-level mutable variables. While this batching pattern works well for coalescing concurrent requests, it means:
- The state persists across React component unmount/remount cycles — deferred promises from unmounted components will still be resolved.
- In a Fast Refresh (HMR) scenario during development, old deferred callbacks may reference stale closures.
This is a known trade-off of the micro-batching pattern and is unlikely to cause user-visible issues in production, but it's worth noting for maintainability. Consider wrapping this logic in a class (similar to ProbeLogsBatcher in endpoint-latency-sparkline.tsx) for better encapsulation and testability.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/[locale]/settings/providers/_components/provider-endpoint-hover.tsx
Line: 56:56
Comment:
**Module-level mutable state shared across components**
`pendingVendorIdsByProviderType`, `deferredByProviderTypeVendorId`, and `vendorStatsFlushTimer` are module-level mutable variables. While this batching pattern works well for coalescing concurrent requests, it means:
1. The state persists across React component unmount/remount cycles — deferred promises from unmounted components will still be resolved.
2. In a Fast Refresh (HMR) scenario during development, old deferred callbacks may reference stale closures.
This is a known trade-off of the micro-batching pattern and is unlikely to cause user-visible issues in production, but it's worth noting for maintainability. Consider wrapping this logic in a class (similar to `ProbeLogsBatcher` in `endpoint-latency-sparkline.tsx`) for better encapsulation and testability.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.|
跟进最新 AI review(CodeRabbit / Greptile / Gemini)再收敛一轮(尽量不改变现有 UX/接口语义):
本地已验证: @coderabbitai 上次 changes requested 的两点已覆盖修复,如方便请重新 review/更新状态。 |
src/repository/provider.ts
Outdated
| return data; | ||
| })(); | ||
|
|
||
| providerStatisticsInFlight = { timezone, promise: fetchPromise }; |
There was a problem hiding this comment.
In-flight set after IIFE starts
providerStatisticsInFlight is assigned after the async IIFE has already started executing on line 910. While this is safe in practice (the IIFE won't yield until the first await db.execute(query) on line 978, by which point line 997 has already executed), this ordering makes the code fragile — if any await were added earlier in the IIFE (e.g., for logging or validation), a concurrent caller arriving between lines 910 and 997 would bypass the dedup and start a second query.
Consider assigning the in-flight record before the IIFE begins executing, or wrapping the pattern with a helper that atomically sets the in-flight promise.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider.ts
Line: 997:997
Comment:
**In-flight set after IIFE starts**
`providerStatisticsInFlight` is assigned after the async IIFE has already started executing on line 910. While this is safe in practice (the IIFE won't yield until the first `await db.execute(query)` on line 978, by which point line 997 has already executed), this ordering makes the code fragile — if any `await` were added earlier in the IIFE (e.g., for logging or validation), a concurrent caller arriving between lines 910 and 997 would bypass the dedup and start a second query.
Consider assigning the in-flight record before the IIFE begins executing, or wrapping the pattern with a helper that atomically sets the in-flight promise.
How can I resolve this? If you propose a fix, please make it concise.
src/actions/my-usage.ts
Outdated
| db | ||
| .select({ | ||
| model: messageRequest.model, | ||
| requests: sql<number>`count(*)::int`, | ||
| cost: sql<string>`COALESCE(sum(${messageRequest.costUsd}), 0)`, | ||
| inputTokens: sql<number>`COALESCE(sum(${messageRequest.inputTokens}), 0)::double precision`, | ||
| outputTokens: sql<number>`COALESCE(sum(${messageRequest.outputTokens}), 0)::double precision`, | ||
| cacheCreationTokens: sql<number>`COALESCE(sum(${messageRequest.cacheCreationInputTokens}), 0)::double precision`, | ||
| cacheReadTokens: sql<number>`COALESCE(sum(${messageRequest.cacheReadInputTokens}), 0)::double precision`, | ||
| }) |
There was a problem hiding this comment.
userBreakdown missing 5m/1h cache columns
The keyBreakdown query selects cacheCreation5mTokens and cacheCreation1hTokens (lines 614-615), but userBreakdown does not. While this doesn't cause a runtime error (the ModelBreakdownItem type doesn't include these fields, and only summaryAcc uses them from keyBreakdown), it creates an asymmetry that could mislead future maintainers into thinking the two queries have identical schemas.
Consider adding the same columns to userBreakdown for consistency, or leaving a comment explaining why they're intentionally omitted.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/actions/my-usage.ts
Line: 630:639
Comment:
**`userBreakdown` missing 5m/1h cache columns**
The `keyBreakdown` query selects `cacheCreation5mTokens` and `cacheCreation1hTokens` (lines 614-615), but `userBreakdown` does not. While this doesn't cause a runtime error (the `ModelBreakdownItem` type doesn't include these fields, and only `summaryAcc` uses them from `keyBreakdown`), it creates an asymmetry that could mislead future maintainers into thinking the two queries have identical schemas.
Consider adding the same columns to `userBreakdown` for consistency, or leaving a comment explaining why they're intentionally omitted.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.|
✅ 移除冗余的熔断重置调用 而且您已经完成了完整的本地验证(build、lint、typecheck、test)。 我现在会重新审查本 PR 的所有变更。 ✅ Actions performedReview triggered.
|
| const existingHealth = healthMap.get(endpointId); | ||
| if (!existingHealth || redisState.circuitState !== existingHealth.circuitState) { | ||
| healthMap.set(endpointId, { | ||
| failureCount: redisState.failureCount, | ||
| lastFailureTime: redisState.lastFailureTime, | ||
| circuitState: redisState.circuitState, | ||
| circuitOpenUntil: redisState.circuitOpenUntil, | ||
| halfOpenSuccessCount: redisState.halfOpenSuccessCount, | ||
| }); | ||
| } | ||
| continue; |
There was a problem hiding this comment.
Stale failureCount/halfOpenSuccessCount when only circuitState differs
On line 151, the condition redisState.circuitState !== existingHealth.circuitState gates whether the in-memory health map is updated. If two instances share Redis state and one increments failureCount while the circuit remains closed, this instance will not pick up the updated failure count because the circuit state hasn't changed.
This means the local failure count can drift from Redis until a state transition occurs. For the circuit breaker's intended purpose (blocking requests to failing endpoints), this is low severity — the state transitions (closed→open) will still propagate. But it means admin dashboards reading from this function may show stale failure counts.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/endpoint-circuit-breaker.ts
Line: 150:160
Comment:
**Stale `failureCount`/`halfOpenSuccessCount` when only `circuitState` differs**
On line 151, the condition `redisState.circuitState !== existingHealth.circuitState` gates whether the in-memory health map is updated. If two instances share Redis state and one increments `failureCount` while the circuit remains `closed`, this instance will **not** pick up the updated failure count because the circuit state hasn't changed.
This means the local failure count can drift from Redis until a state transition occurs. For the circuit breaker's intended purpose (blocking requests to failing endpoints), this is low severity — the state transitions (closed→open) will still propagate. But it means admin dashboards reading from this function may show stale failure counts.
How can I resolve this? If you propose a fix, please make it concise.
src/repository/provider.ts
Outdated
| } catch (error) { | ||
| deferred.reject(error); | ||
| } |
There was a problem hiding this comment.
Error path does not clear providerStatisticsInFlight
When deferred.reject(error) is called on line 1011, the rejection propagates to the caller's await deferred.promise on line 1016, where the finally block clears providerStatisticsInFlight. However, if the caller's finally block throws (unlikely but possible), the in-flight record would leak.
More practically: because providerStatisticsInFlight is set to this deferred on line 922 and only cleared in the caller's finally, there's a window where the IIFE has rejected but providerStatisticsInFlight still references the rejected promise. During this window, a concurrent caller on line 906 would await the already-rejected promise and get the same error — which is actually the correct dedup behavior. So this is safe, just worth noting that the error is shared across all waiters.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider.ts
Line: 1010:1012
Comment:
**Error path does not clear `providerStatisticsInFlight`**
When `deferred.reject(error)` is called on line 1011, the rejection propagates to the caller's `await deferred.promise` on line 1016, where the `finally` block clears `providerStatisticsInFlight`. However, if the caller's `finally` block throws (unlikely but possible), the in-flight record would leak.
More practically: because `providerStatisticsInFlight` is set to this deferred on line 922 and only cleared in the caller's `finally`, there's a window where the IIFE has rejected but `providerStatisticsInFlight` still references the rejected promise. During this window, a concurrent caller on line 906 would `await` the already-rejected promise and get the same error — which is actually the correct dedup behavior. So this is safe, just worth noting that the error is shared across all waiters.
How can I resolve this? If you propose a fix, please make it concise.| const keyString = keyRecord?.[0]?.key ?? null; | ||
| if (!keyString) { | ||
| keyStringByIdCache.delete(keyId); | ||
| return null; | ||
| } | ||
|
|
||
| // Size 控制:先清理过期;仍超限则按 FIFO 淘汰一小部分,避免一次性清空导致缓存击穿 | ||
| if (keyStringByIdCache.size >= KEY_STRING_BY_ID_CACHE_MAX_SIZE) { | ||
| for (const [id, value] of keyStringByIdCache) { | ||
| if (value.expiresAt <= now) { | ||
| keyStringByIdCache.delete(id); | ||
| } | ||
| } | ||
|
|
||
| if (keyStringByIdCache.size >= KEY_STRING_BY_ID_CACHE_MAX_SIZE) { | ||
| const evictCount = Math.max(1, Math.ceil(KEY_STRING_BY_ID_CACHE_MAX_SIZE * 0.1)); | ||
| let remaining = evictCount; | ||
|
|
||
| for (const id of keyStringByIdCache.keys()) { |
There was a problem hiding this comment.
FIFO eviction doesn't evict the oldest entries
The eviction logic uses Map iteration order (insertion order), evicting the first 10% of entries. After expired entries are cleaned, the remaining entries may have been refreshed recently, but Map preserves insertion order from the original set call — not the most recent one. When a cache hit refreshes TTL via set, the entry keeps its original position in the iteration order.
This means a frequently accessed entry inserted early will be evicted before an infrequently accessed entry inserted later. Consider deleting and re-inserting on cache refresh to maintain LRU-like ordering.
This is low severity since the cache is small (1000 entries) and 5-minute TTL naturally evicts stale entries.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/statistics.ts
Line: 46:64
Comment:
**FIFO eviction doesn't evict the oldest entries**
The eviction logic uses `Map` iteration order (insertion order), evicting the first 10% of entries. After expired entries are cleaned, the remaining entries may have been refreshed recently, but `Map` preserves insertion order from the *original* `set` call — not the most recent one. When a cache hit refreshes TTL via `set`, the entry keeps its original position in the iteration order.
This means a frequently accessed entry inserted early will be evicted before an infrequently accessed entry inserted later. Consider deleting and re-inserting on cache refresh to maintain LRU-like ordering.
This is low severity since the cache is small (1000 entries) and 5-minute TTL naturally evicts stale entries.
How can I resolve this? If you propose a fix, please make it concise.|
本次在 PR #782 上继续补齐/修正了几处细节(均与 #779 性能/稳定性相关,尽量不改变现有 UX):
本地验证: |
| input.vendorId, | ||
| input.providerType | ||
| ); | ||
| const filtered = endpoints.filter((e) => e.isEnabled && !e.deletedAt && !excludeSet.has(e.id)); |
There was a problem hiding this comment.
Redundant filter after query change
findEnabledProviderEndpointsByVendorAndType already filters for isEnabled = true and deletedAt IS NULL at the SQL level, so the e.isEnabled && !e.deletedAt checks here are redundant. Only !excludeSet.has(e.id) is still needed.
| const filtered = endpoints.filter((e) => e.isEnabled && !e.deletedAt && !excludeSet.has(e.id)); | |
| const filtered = endpoints.filter((e) => !excludeSet.has(e.id)); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/provider-endpoints/endpoint-selector.ts
Line: 45:45
Comment:
**Redundant filter after query change**
`findEnabledProviderEndpointsByVendorAndType` already filters for `isEnabled = true` and `deletedAt IS NULL` at the SQL level, so the `e.isEnabled && !e.deletedAt` checks here are redundant. Only `!excludeSet.has(e.id)` is still needed.
```suggestion
const filtered = endpoints.filter((e) => !excludeSet.has(e.id));
```
How can I resolve this? If you propose a fix, please make it concise.|
本次更新(同步 dev + 跟进少量 review 关注点):
本地已验证: |
|
本次在 PR #782 上额外合入对 issue #781(Endpoint Health 旧端点/旧类型残留、持续 probe)的修复,尽量保持现有 UX:
本地验证: |
c010d8d to
50d73a9
Compare
src/repository/provider-endpoints.ts
Outdated
| vendorId: providerEndpoints.vendorId, | ||
| providerType: providerEndpoints.providerType, | ||
| lastProbedAt: providerEndpoints.lastProbedAt, | ||
| lastProbeOk: providerEndpoints.lastProbeOk, | ||
| lastProbeErrorType: providerEndpoints.lastProbeErrorType, | ||
| }) | ||
| .from(providerEndpoints) | ||
| .where(and(eq(providerEndpoints.isEnabled, true), isNull(providerEndpoints.deletedAt))) | ||
| .where( | ||
| and( | ||
| eq(providerEndpoints.isEnabled, true), | ||
| isNull(providerEndpoints.deletedAt), | ||
| sql`(${providerEndpoints.vendorId}, ${providerEndpoints.providerType}) IN ( | ||
| SELECT p.provider_vendor_id, p.provider_type | ||
| FROM providers p | ||
| WHERE p.is_enabled = true | ||
| AND p.deleted_at IS NULL | ||
| )` | ||
| ) |
There was a problem hiding this comment.
Correlated subquery executes for every endpoint row, potentially degrading performance under load
The (vendor_id, provider_type) IN (SELECT ...) pattern creates a correlated subquery that PostgreSQL may not optimize into a hash semi-join. For large endpoint tables scanned on each probe cycle, this could become a bottleneck.
Consider using a JOIN or IN subquery with materialization:
INNER JOIN (
SELECT DISTINCT provider_vendor_id, provider_type
FROM providers
WHERE is_enabled = true AND deleted_at IS NULL
) enabled_providers
ON provider_endpoints.vendor_id = enabled_providers.provider_vendor_id
AND provider_endpoints.provider_type = enabled_providers.provider_typeThis allows PostgreSQL to build the enabled providers set once and probe via hash lookup.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider-endpoints.ts
Line: 324:341
Comment:
Correlated subquery executes for every endpoint row, potentially degrading performance under load
The `(vendor_id, provider_type) IN (SELECT ...)` pattern creates a correlated subquery that PostgreSQL may not optimize into a hash semi-join. For large endpoint tables scanned on each probe cycle, this could become a bottleneck.
Consider using a JOIN or `IN` subquery with materialization:
```sql
INNER JOIN (
SELECT DISTINCT provider_vendor_id, provider_type
FROM providers
WHERE is_enabled = true AND deleted_at IS NULL
) enabled_providers
ON provider_endpoints.vendor_id = enabled_providers.provider_vendor_id
AND provider_endpoints.provider_type = enabled_providers.provider_type
```
This allows PostgreSQL to build the enabled providers set once and probe via hash lookup.
How can I resolve this? If you propose a fix, please make it concise.| if ( | ||
| previousUrl && | ||
| previousProviderType && | ||
| (previousUrl !== transformed.url || | ||
| previousProviderType !== transformed.providerType || | ||
| (previousVendorId != null && previousVendorId !== transformed.providerVendorId)) | ||
| ) { | ||
| const syncResult = await syncProviderEndpointOnProviderEdit( |
There was a problem hiding this comment.
is_enabled check missing in endpoint sync condition
When only url or providerType changes (but previousIsEnabled stays false), the code still calls syncProviderEndpointOnProviderEdit. However, line 612 checks previousIsEnabled === false && transformed.isEnabled === true for the ensureProviderEndpointExistsForUrl path.
This creates an asymmetry: if a disabled provider's URL changes, it syncs the endpoint (line 610-635) but doesn't ensure existence (line 636-644). Verify this is intentional behavior, or consolidate the isEnabled transition handling.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider.ts
Line: 615:622
Comment:
`is_enabled` check missing in endpoint sync condition
When only `url` or `providerType` changes (but `previousIsEnabled` stays `false`), the code still calls `syncProviderEndpointOnProviderEdit`. However, line 612 checks `previousIsEnabled === false && transformed.isEnabled === true` for the `ensureProviderEndpointExistsForUrl` path.
This creates an asymmetry: if a disabled provider's URL changes, it syncs the endpoint (line 610-635) but doesn't ensure existence (line 636-644). Verify this is intentional behavior, or consolidate the `isEnabled` transition handling.
How can I resolve this? If you propose a fix, please make it concise.| // 从 Redis 同步到内存时,不能只在 circuitState 变化时才更新: | ||
| // failureCount / halfOpenSuccessCount 等字段在 forceRefresh 下也应保持一致。 | ||
| const existingHealth = healthMap.get(endpointId); | ||
| if (existingHealth) { | ||
| existingHealth.failureCount = redisState.failureCount; | ||
| existingHealth.lastFailureTime = redisState.lastFailureTime; | ||
| existingHealth.circuitState = redisState.circuitState; | ||
| existingHealth.circuitOpenUntil = redisState.circuitOpenUntil; | ||
| existingHealth.halfOpenSuccessCount = redisState.halfOpenSuccessCount; | ||
| } else { | ||
| healthMap.set(endpointId, { | ||
| failureCount: redisState.failureCount, | ||
| lastFailureTime: redisState.lastFailureTime, | ||
| circuitState: redisState.circuitState, | ||
| circuitOpenUntil: redisState.circuitOpenUntil, | ||
| halfOpenSuccessCount: redisState.halfOpenSuccessCount, | ||
| }); | ||
| } |
There was a problem hiding this comment.
Memory sync only updates when circuitState differs, not when counters change
Line 167 gates the memory update on redisState.circuitState !== existingHealth.circuitState. If two instances share Redis and one increments failureCount while the circuit stays closed, this instance won't pick up the updated counter until a state transition occurs.
For getAllEndpointHealthStatusAsync with forceRefresh=true, this was addressed (lines 161-178 always sync all fields). However, for the single-endpoint getOrCreateHealth path (line 67), stale counters may persist. Consider always syncing all fields when Redis returns data, not just on state changes.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/endpoint-circuit-breaker.ts
Line: 161:178
Comment:
Memory sync only updates when `circuitState` differs, not when counters change
Line 167 gates the memory update on `redisState.circuitState !== existingHealth.circuitState`. If two instances share Redis and one increments `failureCount` while the circuit stays `closed`, this instance won't pick up the updated counter until a state transition occurs.
For `getAllEndpointHealthStatusAsync` with `forceRefresh=true`, this was addressed (lines 161-178 always sync all fields). However, for the single-endpoint `getOrCreateHealth` path (line 67), stale counters may persist. Consider always syncing all fields when Redis returns data, not just on state changes.
How can I resolve this? If you propose a fix, please make it concise.| if (!didAnyChunkSucceed) { | ||
| isBatchProbeLogsEndpointAvailable = false; | ||
| batchProbeLogsEndpointDisabledAt = Date.now(); | ||
| } |
There was a problem hiding this comment.
Batch endpoint availability only set on first 404 when no chunks succeeded
If chunk 1 succeeds but chunk 2 returns 404, didAnyChunkSucceed is true, so isBatchProbeLogsEndpointAvailable won't be set to false. This is correct for rolling deployments (some nodes may lack the route temporarily).
However, if the very first batch request across all chunks hits 404 before any succeed, the endpoint is marked unavailable for 5 minutes. Consider a more gradual backoff or exponential retry instead of a fixed 5-minute penalty on first failure.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx
Line: 184:187
Comment:
Batch endpoint availability only set on first 404 when no chunks succeeded
If chunk 1 succeeds but chunk 2 returns 404, `didAnyChunkSucceed` is true, so `isBatchProbeLogsEndpointAvailable` won't be set to `false`. This is correct for rolling deployments (some nodes may lack the route temporarily).
However, if the very first batch request across all chunks hits 404 before any succeed, the endpoint is marked unavailable for 5 minutes. Consider a more gradual backoff or exponential retry instead of a fixed 5-minute penalty on first failure.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.|
|
||
| const key = `${candidate.providerVendorId}::${candidate.providerType}::${candidate.url}`; | ||
| if (endpointKeys.has(key)) { | ||
| continue; | ||
| } | ||
|
|
||
| endpointKeys.set(key, { | ||
| vendorId: candidate.providerVendorId, | ||
| providerType: candidate.providerType, | ||
| url: candidate.url, | ||
| }); | ||
| } | ||
|
|
||
| const endpoints = Array.from(endpointKeys.values()); | ||
| if (endpoints.length === 0) { | ||
| return result.length; | ||
| } | ||
|
|
||
| const chunkSize = 200; | ||
| const activeEndpointKeys = new Set<string>(); | ||
|
|
||
| for (let i = 0; i < endpoints.length; i += chunkSize) { | ||
| const chunk = endpoints.slice(i, i + chunkSize); | ||
| const tupleList = sql.join( |
There was a problem hiding this comment.
Sequential queries inside transaction for batch delete
The for (const candidate of candidates) loop issues a SELECT + conditional UPDATE for each unique (vendor, type, url) combination. For large batch deletes (50+ providers), this holds the transaction open for many round-trips, increasing lock contention risk.
Consider batching the active reference checks into a single query using inArray with tuples or a CTE to reduce round-trips within the transaction.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider.ts
Line: 895:918
Comment:
Sequential queries inside transaction for batch delete
The `for (const candidate of candidates)` loop issues a SELECT + conditional UPDATE for each unique `(vendor, type, url)` combination. For large batch deletes (50+ providers), this holds the transaction open for many round-trips, increasing lock contention risk.
Consider batching the active reference checks into a single query using `inArray` with tuples or a CTE to reduce round-trips within the transaction.
How can I resolve this? If you propose a fix, please make it concise.| const keyString = keyRecord?.[0]?.key ?? null; | ||
| if (!keyString) { | ||
| keyStringByIdCache.delete(keyId); | ||
| return null; | ||
| } | ||
|
|
||
| // Size 控制:先清理过期;仍超限则淘汰最早的一小部分,避免一次性清空导致缓存击穿 | ||
| if (keyStringByIdCache.size >= KEY_STRING_BY_ID_CACHE_MAX_SIZE) { | ||
| for (const [id, value] of keyStringByIdCache) { | ||
| if (value.expiresAt <= now) { |
There was a problem hiding this comment.
FIFO eviction doesn't maintain LRU ordering on cache refresh
When a cache hit occurs, set refreshes the TTL but preserves the entry's original insertion position in Map iteration order. During eviction (line 51-60), older frequently-accessed entries may be evicted before newer rarely-accessed ones.
For true LRU behavior, delete and re-insert on cache hits to update iteration order. Low severity since 5-minute TTL naturally evicts stale data.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/statistics.ts
Line: 51:60
Comment:
FIFO eviction doesn't maintain LRU ordering on cache refresh
When a cache hit occurs, `set` refreshes the TTL but preserves the entry's original insertion position in `Map` iteration order. During eviction (line 51-60), older frequently-accessed entries may be evicted before newer rarely-accessed ones.
For true LRU behavior, delete and re-insert on cache hits to update iteration order. Low severity since 5-minute TTL naturally evicts stale data.
How can I resolve this? If you propose a fix, please make it concise.|
本次更新(基于 dev 最新):
本地验证:
CI 状态:GitHub Actions 已全部通过;Greptile Review 仍在 pending(可能需要稍等刷新)。 |
src/repository/provider-endpoints.ts
Outdated
| vendorId: providerEndpoints.vendorId, | ||
| providerType: providerEndpoints.providerType, | ||
| lastProbedAt: providerEndpoints.lastProbedAt, | ||
| lastProbeOk: providerEndpoints.lastProbeOk, | ||
| lastProbeErrorType: providerEndpoints.lastProbeErrorType, | ||
| }) | ||
| .from(providerEndpoints) | ||
| .where(and(eq(providerEndpoints.isEnabled, true), isNull(providerEndpoints.deletedAt))) | ||
| .where( | ||
| and( | ||
| eq(providerEndpoints.isEnabled, true), | ||
| isNull(providerEndpoints.deletedAt), | ||
| sql`(${providerEndpoints.vendorId}, ${providerEndpoints.providerType}) IN ( | ||
| SELECT p.provider_vendor_id, p.provider_type | ||
| FROM providers p | ||
| WHERE p.is_enabled = true | ||
| AND p.deleted_at IS NULL | ||
| )` | ||
| ) | ||
| ) |
There was a problem hiding this comment.
correlated subquery executes for every endpoint row in probe query
The (vendor_id, provider_type) IN (SELECT ...) pattern creates a correlated subquery checked per endpoint row. For large endpoint tables scanned on each probe cycle, this degrades performance significantly.
Replace with explicit JOIN for hash semi-join optimization:
| vendorId: providerEndpoints.vendorId, | |
| providerType: providerEndpoints.providerType, | |
| lastProbedAt: providerEndpoints.lastProbedAt, | |
| lastProbeOk: providerEndpoints.lastProbeOk, | |
| lastProbeErrorType: providerEndpoints.lastProbeErrorType, | |
| }) | |
| .from(providerEndpoints) | |
| .where(and(eq(providerEndpoints.isEnabled, true), isNull(providerEndpoints.deletedAt))) | |
| .where( | |
| and( | |
| eq(providerEndpoints.isEnabled, true), | |
| isNull(providerEndpoints.deletedAt), | |
| sql`(${providerEndpoints.vendorId}, ${providerEndpoints.providerType}) IN ( | |
| SELECT p.provider_vendor_id, p.provider_type | |
| FROM providers p | |
| WHERE p.is_enabled = true | |
| AND p.deleted_at IS NULL | |
| )` | |
| ) | |
| ) | |
| .from(providerEndpoints) | |
| .innerJoin( | |
| db | |
| .selectDistinct({ | |
| vendorId: providers.providerVendorId, | |
| providerType: providers.providerType, | |
| }) | |
| .from(providers) | |
| .where(and(eq(providers.isEnabled, true), isNull(providers.deletedAt))) | |
| .as('enabled_providers'), | |
| and( | |
| eq(providerEndpoints.vendorId, sql.raw('enabled_providers.vendor_id')), | |
| eq(providerEndpoints.providerType, sql.raw('enabled_providers.provider_type')) | |
| ) | |
| ) | |
| .where( | |
| and( | |
| eq(providerEndpoints.isEnabled, true), | |
| isNull(providerEndpoints.deletedAt) | |
| ) | |
| ) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider-endpoints.ts
Line: 324:342
Comment:
correlated subquery executes for every endpoint row in probe query
The `(vendor_id, provider_type) IN (SELECT ...)` pattern creates a correlated subquery checked per endpoint row. For large endpoint tables scanned on each probe cycle, this degrades performance significantly.
Replace with explicit JOIN for hash semi-join optimization:
```suggestion
.from(providerEndpoints)
.innerJoin(
db
.selectDistinct({
vendorId: providers.providerVendorId,
providerType: providers.providerType,
})
.from(providers)
.where(and(eq(providers.isEnabled, true), isNull(providers.deletedAt)))
.as('enabled_providers'),
and(
eq(providerEndpoints.vendorId, sql.raw('enabled_providers.vendor_id')),
eq(providerEndpoints.providerType, sql.raw('enabled_providers.provider_type'))
)
)
.where(
and(
eq(providerEndpoints.isEnabled, true),
isNull(providerEndpoints.deletedAt)
)
)
```
How can I resolve this? If you propose a fix, please make it concise.|
|
||
| const key = `${candidate.providerVendorId}::${candidate.providerType}::${candidate.url}`; | ||
| if (endpointKeys.has(key)) { | ||
| continue; | ||
| } | ||
|
|
||
| endpointKeys.set(key, { | ||
| vendorId: candidate.providerVendorId, | ||
| providerType: candidate.providerType, | ||
| url: candidate.url, | ||
| }); | ||
| } | ||
|
|
||
| const endpoints = Array.from(endpointKeys.values()); | ||
| if (endpoints.length === 0) { | ||
| return result.length; | ||
| } | ||
|
|
||
| const chunkSize = 200; | ||
| const activeEndpointKeys = new Set<string>(); | ||
|
|
||
| for (let i = 0; i < endpoints.length; i += chunkSize) { | ||
| const chunk = endpoints.slice(i, i + chunkSize); | ||
| const tupleList = sql.join( | ||
| chunk.map( | ||
| (endpoint) => sql`(${endpoint.vendorId}, ${endpoint.providerType}, ${endpoint.url})` |
There was a problem hiding this comment.
sequential queries inside transaction for batch delete increase lock contention
The loop issues SELECT + conditional UPDATE for each unique (vendor, type, url). For large batch deletes (50+ providers), this holds the transaction open across many sequential queries, risking lock contention.
Batch the active reference checks into a single query using a CTE or VALUES list to reduce round-trips within the transaction.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider.ts
Line: 895:920
Comment:
sequential queries inside transaction for batch delete increase lock contention
The loop issues SELECT + conditional UPDATE for each unique `(vendor, type, url)`. For large batch deletes (50+ providers), this holds the transaction open across many sequential queries, risking lock contention.
Batch the active reference checks into a single query using a CTE or `VALUES` list to reduce round-trips within the transaction.
How can I resolve this? If you propose a fix, please make it concise.- Provider 统计:消除隐式 cross join,收敛 in-flight 清理;deleteProvidersBatch 降低事务内往返\n- Providers hover:按 QueryClient 隔离微批量并支持 AbortSignal,减少串扰与潜在泄漏\n- Probe/熔断/缓存:probe 目标查询改为 join;Redis 同步时更新计数字段;统计缓存保持 FIFO 语义\n- My Usage:userBreakdown 补齐 5m/1h cache 聚合列(当前 UI 未展示)
|
跟进:又补了一轮小修复/性能补强(不改变 UX),主要针对 Greptile/Review 提到的边角性能与一致性问题:
本地已重新跑过: |
| } | ||
|
|
||
| const limitPerEndpoint = Math.max(1, input.limitPerEndpoint); | ||
| const endpointIdCondition = inArray(providerEndpointProbeLogs.endpointId, endpointIds); |
There was a problem hiding this comment.
using inArray with Drizzle's parameterization is more secure than building SQL strings, even with validated integers
action layer validates with Zod, but prefer parameterized approach for defense-in-depth:
| const endpointIdCondition = inArray(providerEndpointProbeLogs.endpointId, endpointIds); | |
| const endpointIdCondition = inArray(providerEndpointProbeLogs.endpointId, endpointIds); |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider-endpoints-batch.ts
Line: 80:80
Comment:
using `inArray` with Drizzle's parameterization is more secure than building SQL strings, even with validated integers
action layer validates with Zod, but prefer parameterized approach for defense-in-depth:
```suggestion
const endpointIdCondition = inArray(providerEndpointProbeLogs.endpointId, endpointIds);
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| ) | ||
| SELECT | ||
| p.id, | ||
| COALESCE( | ||
| SUM(CASE | ||
| WHEN (mr.created_at AT TIME ZONE ${timezone})::date = (CURRENT_TIMESTAMP AT TIME ZONE ${timezone})::date | ||
| AND ( | ||
| -- 情况1:无重试(provider_chain 为 NULL 或空数组),使用 provider_id | ||
| (mr.provider_chain IS NULL OR jsonb_array_length(mr.provider_chain) = 0) AND mr.provider_id = p.id | ||
| OR | ||
| -- 情况2:有重试,使用 providerChain 最后一项的 id | ||
| (mr.provider_chain IS NOT NULL AND jsonb_array_length(mr.provider_chain) > 0 | ||
| AND (mr.provider_chain->-1->>'id')::int = p.id) | ||
| ) | ||
| THEN mr.cost_usd ELSE 0 END), | ||
| 0 | ||
| ) AS today_cost, | ||
| COUNT(CASE | ||
| WHEN (mr.created_at AT TIME ZONE ${timezone})::date = (CURRENT_TIMESTAMP AT TIME ZONE ${timezone})::date | ||
| AND ( | ||
| (mr.provider_chain IS NULL OR jsonb_array_length(mr.provider_chain) = 0) AND mr.provider_id = p.id | ||
| OR | ||
| (mr.provider_chain IS NOT NULL AND jsonb_array_length(mr.provider_chain) > 0 | ||
| AND (mr.provider_chain->-1->>'id')::int = p.id) | ||
| ) | ||
| THEN 1 END)::integer AS today_calls | ||
| COALESCE(ps.today_cost, 0) AS today_cost, | ||
| COALESCE(ps.today_calls, 0) AS today_calls, | ||
| lc.last_call_time, |
There was a problem hiding this comment.
tomorrow_start computed as plain timestamp (not timestamptz) due to inner AT TIME ZONE
comparison with timestamptz column relies on PostgreSQL's implicit cast using session timezone - works in UTC but could behave unexpectedly in non-UTC sessions
wrap with outer AT TIME ZONE for type consistency:
| ) | |
| SELECT | |
| p.id, | |
| COALESCE( | |
| SUM(CASE | |
| WHEN (mr.created_at AT TIME ZONE ${timezone})::date = (CURRENT_TIMESTAMP AT TIME ZONE ${timezone})::date | |
| AND ( | |
| -- 情况1:无重试(provider_chain 为 NULL 或空数组),使用 provider_id | |
| (mr.provider_chain IS NULL OR jsonb_array_length(mr.provider_chain) = 0) AND mr.provider_id = p.id | |
| OR | |
| -- 情况2:有重试,使用 providerChain 最后一项的 id | |
| (mr.provider_chain IS NOT NULL AND jsonb_array_length(mr.provider_chain) > 0 | |
| AND (mr.provider_chain->-1->>'id')::int = p.id) | |
| ) | |
| THEN mr.cost_usd ELSE 0 END), | |
| 0 | |
| ) AS today_cost, | |
| COUNT(CASE | |
| WHEN (mr.created_at AT TIME ZONE ${timezone})::date = (CURRENT_TIMESTAMP AT TIME ZONE ${timezone})::date | |
| AND ( | |
| (mr.provider_chain IS NULL OR jsonb_array_length(mr.provider_chain) = 0) AND mr.provider_id = p.id | |
| OR | |
| (mr.provider_chain IS NOT NULL AND jsonb_array_length(mr.provider_chain) > 0 | |
| AND (mr.provider_chain->-1->>'id')::int = p.id) | |
| ) | |
| THEN 1 END)::integer AS today_calls | |
| COALESCE(ps.today_cost, 0) AS today_cost, | |
| COALESCE(ps.today_calls, 0) AS today_calls, | |
| lc.last_call_time, | |
| today_start AT TIME ZONE '${userTimezone}' AS today_start, | |
| (today_start + INTERVAL '1 day') AT TIME ZONE '${userTimezone}' AS tomorrow_start, | |
| (today_start - INTERVAL '6 days') AT TIME ZONE '${userTimezone}' AS last7_start |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider.ts
Line: 1110:1115
Comment:
`tomorrow_start` computed as plain `timestamp` (not `timestamptz`) due to inner `AT TIME ZONE`
comparison with `timestamptz` column relies on PostgreSQL's implicit cast using session timezone - works in UTC but could behave unexpectedly in non-UTC sessions
wrap with outer `AT TIME ZONE` for type consistency:
```suggestion
today_start AT TIME ZONE '${userTimezone}' AS today_start,
(today_start + INTERVAL '1 day') AT TIME ZONE '${userTimezone}' AS tomorrow_start,
(today_start - INTERVAL '6 days') AT TIME ZONE '${userTimezone}' AS last7_start
```
How can I resolve this? If you propose a fix, please make it concise.- batchUpdateProviders 会走 updateProvidersBatch;当供应商从 disabled 批量启用时,best-effort 插入缺失的 provider_endpoints 记录\n- 避免历史/竞态导致启用后严格端点策略下无可用 endpoint 而被阻断
|
本次在 PR #782 上继续做了一轮查漏补缺(仍严格聚焦 #779/#781),主要是把少量“成功路径的刷新放大”和“多实例熔断恢复”再收敛一层,并补齐 my-usage 的同类 SQL 退化点:
本地验证: |
| // 如果 provider_chain 为空(无重试),则使用 provider_id 字段 | ||
| const query = sql` | ||
| WITH provider_stats AS ( | ||
| const promise = Promise.resolve().then(async () => { |
There was a problem hiding this comment.
Cache not invalidated on error path
When the IIFE created by Promise.resolve().then(async () => { ... }) throws, providerStatisticsCache is never updated — which is correct (stale-on-error). However, the providerStatisticsInFlight dedup relies on try/finally at line 1234 to clean up, but exceptions thrown before providerStatisticsInFlight is assigned (line 1231) would propagate to the outer catch at line 1240 without any dedup state to clear.
More importantly, Promise.resolve().then(...) creates a microtask where uncaught rejections inside the IIFE will only propagate when the promise is awaited at line 1235. If resolveSystemTimezone() (called before the IIFE, at line 1126) were to be moved inside the IIFE in a future refactor, and it threw synchronously before the IIFE's first await, the error path would still work correctly due to the try/finally wrapping.
This is safe as-written, but the Promise.resolve().then(...) pattern is somewhat unusual for dedup. A simpler approach would be to make the function async and assign the dedup promise directly from the call, avoiding the indirection.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider.ts
Line: 1140:1140
Comment:
**Cache not invalidated on error path**
When the IIFE created by `Promise.resolve().then(async () => { ... })` throws, `providerStatisticsCache` is never updated — which is correct (stale-on-error). However, the `providerStatisticsInFlight` dedup relies on `try/finally` at line 1234 to clean up, but exceptions thrown before `providerStatisticsInFlight` is assigned (line 1231) would propagate to the outer `catch` at line 1240 without any dedup state to clear.
More importantly, `Promise.resolve().then(...)` creates a microtask where uncaught rejections inside the IIFE will only propagate when the promise is `await`ed at line 1235. If `resolveSystemTimezone()` (called before the IIFE, at line 1126) were to be moved inside the IIFE in a future refactor, and it threw synchronously before the IIFE's first `await`, the error path would still work correctly due to the `try/finally` wrapping.
This is safe as-written, but the `Promise.resolve().then(...)` pattern is somewhat unusual for dedup. A simpler approach would be to make the function `async` and assign the dedup promise directly from the call, avoiding the indirection.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| if (showSkeleton) { | ||
| return ( | ||
| <div ref={ref} className="flex items-center gap-2"> | ||
| <div className="h-6 w-32 rounded bg-muted/20" /> |
There was a problem hiding this comment.
isLoading preferred over isFetching for skeleton display
showSkeleton uses isLoading (which is the correct choice for first-load skeletons — this was fixed from an earlier version that used isFetching). However, the condition !isInView || isLoading means that even after data has loaded, if isInView somehow toggles back to false (which shouldn't happen with useInViewOnce), it would show skeleton again.
Since useInViewOnce guarantees isInView only transitions from false → true (never back), this is safe. Just noting for clarity.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx
Line: 493:493
Comment:
**`isLoading` preferred over `isFetching` for skeleton display**
`showSkeleton` uses `isLoading` (which is the correct choice for first-load skeletons — this was fixed from an earlier version that used `isFetching`). However, the condition `!isInView || isLoading` means that even after data has loaded, if `isInView` somehow toggles back to `false` (which shouldn't happen with `useInViewOnce`), it would show skeleton again.
Since `useInViewOnce` guarantees `isInView` only transitions from `false → true` (never back), this is safe. Just noting for clarity.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.- AvailabilityDashboard: 抑制重叠/乱序刷新,前后台切换节流强刷\n- Probe scheduler/cleanup: idle DB poll + 锁续租,降低无意义扫描与并发清理\n- Endpoint circuit: Redis 同步节流(1s)\n- My Usage: key/user breakdown 合并为单次聚合\n- DB: 新增 message_request key+model/endpoint 部分索引迁移;修复 journal 单调性校验与迁移表 created_at 自愈
src/app/providers.tsx
Outdated
| queries: { | ||
| staleTime: 30_000, | ||
| refetchOnWindowFocus: false, |
There was a problem hiding this comment.
Global refetchOnWindowFocus: false affects all queries
Setting refetchOnWindowFocus: false and staleTime: 30_000 as global defaults applies to every React Query in the application, not just the providers pages targeted by this PR. Pages like the dashboard overview, usage statistics, and leaderboard previously relied on the default refetchOnWindowFocus: true to update data when users return to the tab.
While many provider-related queries already set refetchOnWindowFocus: false individually, applying it globally could cause other pages to feel stale — for example, a user switching between a terminal and the usage dashboard would no longer see fresh data without manual refresh.
Consider whether these defaults should remain global, or if they should only be set on the specific queries affected by the request storm issue.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/providers.tsx
Line: 17:19
Comment:
**Global `refetchOnWindowFocus: false` affects all queries**
Setting `refetchOnWindowFocus: false` and `staleTime: 30_000` as global defaults applies to *every* React Query in the application, not just the providers pages targeted by this PR. Pages like the dashboard overview, usage statistics, and leaderboard previously relied on the default `refetchOnWindowFocus: true` to update data when users return to the tab.
While many provider-related queries already set `refetchOnWindowFocus: false` individually, applying it globally could cause other pages to feel stale — for example, a user switching between a terminal and the usage dashboard would no longer see fresh data without manual refresh.
Consider whether these defaults should remain global, or if they should only be set on the specific queries affected by the request storm issue.
How can I resolve this? If you propose a fix, please make it concise.| // 如果 provider_chain 为空(无重试),则使用 provider_id 字段 | ||
| const query = sql` | ||
| WITH provider_stats AS ( | ||
| const promise = Promise.resolve().then(async () => { |
There was a problem hiding this comment.
In-flight dedup assigned after IIFE starts executing
providerStatisticsInFlight is set on line 1231, but the IIFE created by Promise.resolve().then(async () => { ... }) is already executing as a microtask by the time line 1231 runs. In practice this is safe because Promise.resolve().then(...) schedules the callback as a microtask that won't run until the current synchronous execution completes (including line 1231). However, this ordering is fragile — if this function were ever refactored to use await between the IIFE creation and the providerStatisticsInFlight assignment, concurrent callers could bypass the dedup check.
A more robust pattern would assign the in-flight record synchronously before starting the async work:
| const promise = Promise.resolve().then(async () => { | |
| const promise = (async () => { |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider.ts
Line: 1140:1140
Comment:
**In-flight dedup assigned after IIFE starts executing**
`providerStatisticsInFlight` is set on line 1231, but the IIFE created by `Promise.resolve().then(async () => { ... })` is already executing as a microtask by the time line 1231 runs. In practice this is safe because `Promise.resolve().then(...)` schedules the callback as a microtask that won't run until the current synchronous execution completes (including line 1231). However, this ordering is fragile — if this function were ever refactored to use `await` between the IIFE creation and the `providerStatisticsInFlight` assignment, concurrent callers could bypass the dedup check.
A more robust pattern would assign the in-flight record synchronously before starting the async work:
```suggestion
const promise = (async () => {
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.|
|
||
| while (true) { | ||
| const whereClauses = [isNull(providers.deletedAt), gt(providers.id, lastProviderId)]; | ||
| const whereClauses = [ |
There was a problem hiding this comment.
Backfill only processes enabled providers, skipping disabled ones
The new eq(providers.isEnabled, true) filter means the backfill will no longer create endpoint pool entries for disabled providers. This is intentional per the comment, but could be a subtle behavioral change: if an admin later enables a previously-disabled provider, the endpoint pool entry won't exist until the next syncProviderEndpointOnProviderEdit or updateProvidersBatch call.
The updateProvidersBatch path (line 851-927) does handle this case (it inserts endpoints when enabling), and updateProvider also handles the previousIsEnabled === false && transformed.isEnabled === true transition. Just confirming this coverage is intentional — the backfill is no longer the safety net for disabled→enabled transitions.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider-endpoints.ts
Line: 1640:1640
Comment:
**Backfill only processes enabled providers, skipping disabled ones**
The new `eq(providers.isEnabled, true)` filter means the backfill will no longer create endpoint pool entries for disabled providers. This is intentional per the comment, but could be a subtle behavioral change: if an admin later enables a previously-disabled provider, the endpoint pool entry won't exist until the next `syncProviderEndpointOnProviderEdit` or `updateProvidersBatch` call.
The `updateProvidersBatch` path (line 851-927) does handle this case (it inserts endpoints when enabling), and `updateProvider` also handles the `previousIsEnabled === false && transformed.isEnabled === true` transition. Just confirming this coverage is intentional — the backfill is no longer the safety net for disabled→enabled transitions.
How can I resolve this? If you propose a fix, please make it concise.|
本次补充提交(ec3cf1fe):
本地已复跑: |
|
本次补充提交(4093ddd1,仍聚焦 #781 的 Endpoint Health 体感):
本地已复跑: |
| const workers = Array.from({ length: Math.min(concurrency, endpointIds.length) }, async () => { | ||
| for (;;) { | ||
| const current = endpointIds[idx]; | ||
| idx += 1; | ||
| if (current == null) return; |
There was a problem hiding this comment.
race condition in concurrent index increment
while the comment claims single-threaded safety, idx++ happens after async boundary. if worker A reads idx=0 but yields before incrementing, worker B can also read idx=0, causing both to process endpointIds[0] twice
use atomic pattern:
const current = idx++;
if (current >= endpointIds.length) return;
const endpointId = endpointIds[current];
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx
Line: 275:279
Comment:
race condition in concurrent index increment
while the comment claims single-threaded safety, `idx++` happens after async boundary. if worker A reads `idx=0` but yields before incrementing, worker B can also read `idx=0`, causing both to process `endpointIds[0]` twice
use atomic pattern:
```
const current = idx++;
if (current >= endpointIds.length) return;
const endpointId = endpointIds[current];
```
How can I resolve this? If you propose a fix, please make it concise.| const endpointValues = sql.join( | ||
| endpointIds.map((id) => sql`(${id})`), | ||
| sql`, ` | ||
| ); | ||
|
|
||
| const query = sql` | ||
| WITH endpoint_ids(endpoint_id) AS ( | ||
| VALUES ${endpointValues} |
There was a problem hiding this comment.
endpoint IDs interpolated into raw SQL via sql.join
while Zod validates these as integers at action layer, prefer Drizzle's parameterized inArray for defense-in-depth:
const query = sql`
SELECT ... FROM ${providerEndpointProbeLogs}
WHERE endpoint_id = ANY(${sql.array(endpointIds)})
ORDER BY ...or use inArray(providerEndpointProbeLogs.endpointId, endpointIds) in WHERE clause
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider-endpoints-batch.ts
Line: 83:90
Comment:
endpoint IDs interpolated into raw SQL via `sql.join`
while Zod validates these as integers at action layer, prefer Drizzle's parameterized `inArray` for defense-in-depth:
```typescript
const query = sql`
SELECT ... FROM ${providerEndpointProbeLogs}
WHERE endpoint_id = ANY(${sql.array(endpointIds)})
ORDER BY ...
```
or use `inArray(providerEndpointProbeLogs.endpointId, endpointIds)` in WHERE clause
How can I resolve this? If you propose a fix, please make it concise.| const idList = sql.join( | ||
| uniqueIds.map((id) => sql`${id}`), | ||
| sql`, ` | ||
| ); |
There was a problem hiding this comment.
IDs interpolated into raw SQL via sql.join
prefer parameterized inArray:
.where(and(
inArray(providers.id, uniqueIds),
isNull(providers.deletedAt)
))Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider.ts
Line: 945:948
Comment:
IDs interpolated into raw SQL via `sql.join`
prefer parameterized `inArray`:
```typescript
.where(and(
inArray(providers.id, uniqueIds),
isNull(providers.deletedAt)
))
```
How can I resolve this? If you propose a fix, please make it concise.| if (keyStringByIdCache.size >= KEY_STRING_BY_ID_CACHE_MAX_SIZE) { | ||
| const evictCount = Math.max(1, Math.ceil(KEY_STRING_BY_ID_CACHE_MAX_SIZE * 0.1)); | ||
| let remaining = evictCount; | ||
|
|
||
| for (const id of keyStringByIdCache.keys()) { | ||
| keyStringByIdCache.delete(id); | ||
| remaining -= 1; | ||
| if (remaining <= 0) break; | ||
| } |
There was a problem hiding this comment.
FIFO eviction doesn't maintain true LRU ordering on cache refresh
Map iteration order reflects original insertion time. when cache hit refreshes TTL via set (line 76), entry keeps its original position in iteration order. frequently-accessed entries inserted early will be evicted before rarely-accessed entries inserted later
for true LRU, delete then re-insert on cache hit:
keyStringByIdCache.delete(keyId);
keyStringByIdCache.set(keyId, cached);Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/statistics.ts
Line: 64:72
Comment:
FIFO eviction doesn't maintain true LRU ordering on cache refresh
`Map` iteration order reflects original insertion time. when cache hit refreshes TTL via `set` (line 76), entry keeps its original position in iteration order. frequently-accessed entries inserted early will be evicted before rarely-accessed entries inserted later
for true LRU, delete then re-insert on cache hit:
```typescript
keyStringByIdCache.delete(keyId);
keyStringByIdCache.set(keyId, cached);
```
How can I resolve this? If you propose a fix, please make it concise.- 按 (root+options) 复用 observer pool,减少长列表/大表格下的 observer 实例数\n- 补齐单测覆盖(test env 直通 + 共享/释放语义)
|
本次补充提交(fdc1470d,仍聚焦 #779 Providers 页长列表资源占用):
本地已复跑: |
关联:
ding113/claude-code-hub#779、ding113/claude-code-hub#781背景/问题
主要改动(尽量保持 UX 不变)
batchGetVendorTypeEndpointStats批量统计 enabled 数;tooltip 打开时才拉 endpoints 详情与熔断状态batchGetProviderEndpointProbeLogs(允许部分 chunk 成功 + 部分 404/失败并存,按需降级);支持 query cancel/卸载时从批处理中移除请求,减少无意义 flushgetAllEndpointHealthStatusAsync同步内存;运行时端点选择也改为批量读取熔断状态,减少 Redis 往返resetEndpointCircuit;probe 记录在端点被并发删除时静默忽略(避免 FK 失败导致任务中断)syncProviderEndpointOnProviderEdit优先读取 active 行;revive 遇到 23505 时降级为重读 active 行,避免并发/历史脏数据导致事务回滚router.refresh();invalidation 从全局收敛到 vendor 维度;Vendor 视图端点池 section 增加 in-view 延迟挂载src/repository/provider-endpoints-batch.ts移除顶层"use server",改为server-only(避免误暴露为 Server Action)兼容/升级
idx_provider_endpoints_pick_enabled:加速运行时端点选择热路径(vendor_id + provider_type + is_enabled定位 +sort_order有序扫描)idx_providers_vendor_type_url_active:加速 provider URL 编辑时的引用判断(vendor/type/url 精确匹配)idx_providers_enabled_vendor_type:加速 Dashboard/Probe scheduler 的 enabled vendor/type DISTINCT(partial:deleted_at IS NULL AND is_enabled = true AND provider_vendor_id > 0)idx_message_request_key_created_at_id:加速 my-usage/usage logs 的 key 维度分页与时间范围扫描(key + created_at desc + id desc,partialdeleted_at IS NULL)idx_message_request_key_model_active:加速 my-usage 下拉筛选器DISTINCT model(按 key 维度,partial 排除 warmup)idx_message_request_key_endpoint_active:加速 my-usage 下拉筛选器DISTINCT endpoint(按 key 维度,partial 排除 warmup)idx_message_request_created_at_id_active:加速 admin usage logs 的 keyset 分页(created_at desc + id desc,partialdeleted_at IS NULL)idx_message_request_model_active:加速 admin usage logs 筛选器的DISTINCT model(partialdeleted_at IS NULL AND model IS NOT NULL)idx_message_request_status_code_active:加速 admin usage logs 筛选器的DISTINCT status_code(partialdeleted_at IS NULL AND status_code IS NOT NULL)AUTO_MIGRATE=true会自动应用迁移AUTO_MIGRATE:请在升级后执行bun run db:migrate(或使用 Drizzle CLI)应用迁移本地验证
bun run lint(仅既有 useExhaustiveDependencies warnings)bun run lint:fixbun run typecheckbun run testbun run buildbun run db:generate(生成索引迁移;迁移幂等性检查通过)Greptile Overview
Greptile Summary
comprehensive performance optimization addressing N+1 queries and request storms in Providers system through batch processing, in-view loading, and Redis pipelining.
Key improvements:
useInViewOncewith shared IntersectionObserver pools defer sparkline queries until 200px from viewportidx_provider_endpoints_pick_enabled), usage log pagination (idx_message_request_key_created_at_id), and probe scheduler queries"use server"from batch repository file, replaced withserver-onlyMigration notes:
AUTO_MIGRATE=truereceive automatic migration on restartbun run db:migratebefore starting new versionConfidence Score: 5/5
Important Files Changed
LATERALjoins for optimal performanceSequence Diagram
sequenceDiagram participant UI as Provider UI participant Hook as useInViewOnce participant Batcher as ProbeLogsBatcher participant API as Batch API participant Repo as Repository participant Redis as Redis Pipeline participant DB as PostgreSQL Note over UI,DB: Request Storm Prevention UI->>Hook: mount sparkline row Hook->>Hook: wait for in-view (200px margin) Hook-->>UI: isInView=true UI->>Batcher: load(endpointId, limit) Note over Batcher: 10ms micro-batch window UI->>Batcher: load(endpointId2, limit) UI->>Batcher: load(endpointId3, limit) Batcher->>Batcher: flush after 10ms Batcher->>API: POST /batchGetProviderEndpointProbeLogs Note over API: chunks of 500 IDs API->>Repo: findProviderEndpointProbeLogsBatch Repo->>DB: LATERAL join per endpoint DB-->>Repo: logs by endpoint_id Repo-->>API: Map<endpointId, logs[]> API-->>Batcher: batch response Batcher->>Batcher: resolve all pending promises Batcher-->>UI: logs for each endpoint Note over UI,DB: Circuit Breaker State Loading UI->>Repo: getAllEndpointHealthStatusAsync Repo->>Redis: pipeline.hgetall (batch 200) Redis-->>Repo: state map Repo->>Repo: sync to memory Repo-->>UI: health status mapLast reviewed commit: fdc1470