Skip to content

Providers:性能与稳定性优化(#779) [未完成]#782

Open
tesgth032 wants to merge 46 commits intoding113:devfrom
tesgth032:issue/779-provider-performance
Open

Providers:性能与稳定性优化(#779) [未完成]#782
tesgth032 wants to merge 46 commits intoding113:devfrom
tesgth032:issue/779-provider-performance

Conversation

@tesgth032
Copy link
Contributor

@tesgth032 tesgth032 commented Feb 13, 2026

关联:ding113/claude-code-hub#779ding113/claude-code-hub#781

背景/问题

  • Providers 列表/详情存在明显的 N+1 与重复请求(逐 vendor / 逐 endpoint 拉取),在数据规模增大后会持续拖慢前端并产生“请求风暴”体感
  • 端点池相关出现“端点不存在/无法测试”等不稳定体感(多实例缓存不同步、批量能力不足导致峰值并发过高等)

主要改动(尽量保持 UX 不变)

  • Providers 列表端点计数:使用 batchGetVendorTypeEndpointStats 批量统计 enabled 数;tooltip 打开时才拉 endpoints 详情与熔断状态
  • 端点表 sparkline(probe logs):仅在可视区域(in-view)触发;10ms 微批量合并;优先走 batch HTTP batchGetProviderEndpointProbeLogs(允许部分 chunk 成功 + 部分 404/失败并存,按需降级);支持 query cancel/卸载时从批处理中移除请求,减少无意义 flush
  • 熔断状态:Redis pipeline 批量读取 + getAllEndpointHealthStatusAsync 同步内存;运行时端点选择也改为批量读取熔断状态,减少 Redis 往返
  • 端点变更稳定性:编辑/删除/手动测活成功后 best-effort resetEndpointCircuit;probe 记录在端点被并发删除时静默忽略(避免 FK 失败导致任务中断)
  • 端点同步竞态补强:syncProviderEndpointOnProviderEdit 优先读取 active 行;revive 遇到 23505 时降级为重读 active 行,避免并发/历史脏数据导致事务回滚
  • 去掉 refresh 放大器:Providers 管理页主要操作移除 router.refresh();invalidation 从全局收敛到 vendor 维度;Vendor 视图端点池 section 增加 in-view 延迟挂载
  • 同类性能退化补强:provider 统计 SQL 去掉潜在笛卡尔积;my-usage/usage-logs 查询减少 join 与往返,并对 my-usage logs 的 total 计数做短 TTL 缓存以减少翻页/轮询重复 COUNT
  • 安全收敛:src/repository/provider-endpoints-batch.ts 移除顶层 "use server",改为 server-only(避免误暴露为 Server Action)

兼容/升级

  • 本 PR 包含索引迁移(无表结构变更,仅新增 index):
    • idx_provider_endpoints_pick_enabled:加速运行时端点选择热路径(vendor_id + provider_type + is_enabled 定位 + sort_order 有序扫描)
    • idx_providers_vendor_type_url_active:加速 provider URL 编辑时的引用判断(vendor/type/url 精确匹配)
    • idx_providers_enabled_vendor_type:加速 Dashboard/Probe scheduler 的 enabled vendor/type DISTINCT(partial:deleted_at IS NULL AND is_enabled = true AND provider_vendor_id > 0
    • idx_message_request_key_created_at_id:加速 my-usage/usage logs 的 key 维度分页与时间范围扫描(key + created_at desc + id desc,partial deleted_at IS NULL
    • idx_message_request_key_model_active:加速 my-usage 下拉筛选器 DISTINCT model(按 key 维度,partial 排除 warmup)
    • idx_message_request_key_endpoint_active:加速 my-usage 下拉筛选器 DISTINCT endpoint(按 key 维度,partial 排除 warmup)
    • idx_message_request_created_at_id_active:加速 admin usage logs 的 keyset 分页(created_at desc + id desc,partial deleted_at IS NULL
    • idx_message_request_model_active:加速 admin usage logs 筛选器的 DISTINCT model(partial deleted_at IS NULL AND model IS NOT NULL
    • idx_message_request_status_code_active:加速 admin usage logs 筛选器的 DISTINCT status_code(partial deleted_at IS NULL AND status_code IS NOT NULL
  • Docker 用户:拉取最新镜像并重启即可;默认 AUTO_MIGRATE=true 会自动应用迁移
  • 若生产环境关闭了 AUTO_MIGRATE:请在升级后执行 bun run db:migrate(或使用 Drizzle CLI)应用迁移
  • 前端对 batch HTTP endpoint 做了 404 兼容(避免滚动发布时短暂不可用)

本地验证

  • bun run lint(仅既有 useExhaustiveDependencies warnings)
  • bun run lint:fix
  • bun run typecheck
  • bun run test
  • bun run build
  • bun run db:generate(生成索引迁移;迁移幂等性检查通过)

Greptile Overview

Greptile Summary

comprehensive performance optimization addressing N+1 queries and request storms in Providers system through batch processing, in-view loading, and Redis pipelining.

Key improvements:

  • Batch endpoint stats/probe logs via new HTTP endpoints with 500-ID chunks and graceful 404 fallback for rolling deployments
  • 10ms micro-batching in UI coalesces rapid scrolling into single batch request
  • useInViewOnce with shared IntersectionObserver pools defer sparkline queries until 200px from viewport
  • Redis pipeline loads circuit breaker states in batches of 200, reducing round-trips from N to ceiling(N/200)
  • 9 new database indexes optimize endpoint selection (idx_provider_endpoints_pick_enabled), usage log pagination (idx_message_request_key_created_at_id), and probe scheduler queries
  • 5-minute TTL cache for usage log total counts reduces repeated COUNT queries during pagination
  • Endpoint sync race conditions hardened with active row priority and 23505 duplicate key retry logic
  • Security: removed top-level "use server" from batch repository file, replaced with server-only

Migration notes:

  • includes 6 index migrations (0068-0073) with IF NOT EXISTS for safe idempotent rollout
  • Docker users with AUTO_MIGRATE=true receive automatic migration on restart
  • manual deployments require bun run db:migrate before starting new version

Confidence Score: 5/5

  • safe to merge - well-tested performance optimization with comprehensive test coverage and idempotent migrations
  • all previous threads addressed, extensive test suite covers edge cases (race conditions, concurrent requests, DST handling), defensive patterns throughout (error tolerance in batch Redis, 404 fallback for rolling deploys, defensive slicing), security improved by removing "use server" exposure
  • no files require special attention - thorough review of previous threads shows all raised concerns were intentional design decisions or already addressed

Important Files Changed

Filename Overview
src/repository/provider-endpoints-batch.ts batch endpoint stats and probe logs queries using LATERAL joins for optimal performance
src/lib/redis/endpoint-circuit-breaker-state.ts batch Redis pipeline for loading circuit breaker states with error tolerance pattern
src/app/[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx micro-batching probe log requests with abort support and intelligent batch/fallback routing
src/lib/hooks/use-in-view-once.ts shared IntersectionObserver pooling to reduce resource usage across components
src/repository/provider.ts provider statistics with in-flight dedup, endpoint sync logic updated for race condition handling
src/repository/usage-logs.ts slim query with warmup exclusion and new indexes for improved pagination performance
src/actions/my-usage.ts usage log actions updated to use slim query with all filters including sessionId
drizzle/0073_minor_franklin_richards.sql adds partial index for enabled providers query optimization in probe scheduler
drizzle/0072_absurd_gwen_stacy.sql adds composite index for runtime endpoint selection hot path with sort_order
src/lib/endpoint-circuit-breaker.ts batch loading circuit states from Redis with proper memory/Redis state synchronization
src/repository/statistics.ts adds 5-minute TTL cache for total usage count queries with FIFO eviction
src/lib/provider-endpoints/endpoint-selector.ts batch circuit state loading reduces Redis round-trips in endpoint selection

Sequence Diagram

sequenceDiagram
    participant UI as Provider UI
    participant Hook as useInViewOnce
    participant Batcher as ProbeLogsBatcher
    participant API as Batch API
    participant Repo as Repository
    participant Redis as Redis Pipeline
    participant DB as PostgreSQL

    Note over UI,DB: Request Storm Prevention

    UI->>Hook: mount sparkline row
    Hook->>Hook: wait for in-view (200px margin)
    Hook-->>UI: isInView=true
    
    UI->>Batcher: load(endpointId, limit)
    Note over Batcher: 10ms micro-batch window
    UI->>Batcher: load(endpointId2, limit)
    UI->>Batcher: load(endpointId3, limit)
    
    Batcher->>Batcher: flush after 10ms
    Batcher->>API: POST /batchGetProviderEndpointProbeLogs
    Note over API: chunks of 500 IDs
    
    API->>Repo: findProviderEndpointProbeLogsBatch
    Repo->>DB: LATERAL join per endpoint
    DB-->>Repo: logs by endpoint_id
    Repo-->>API: Map<endpointId, logs[]>
    API-->>Batcher: batch response
    
    Batcher->>Batcher: resolve all pending promises
    Batcher-->>UI: logs for each endpoint
    
    Note over UI,DB: Circuit Breaker State Loading
    
    UI->>Repo: getAllEndpointHealthStatusAsync
    Repo->>Redis: pipeline.hgetall (batch 200)
    Redis-->>Repo: state map
    Repo->>Repo: sync to memory
    Repo-->>UI: health status map
Loading

Last reviewed commit: fdc1470

@coderabbitai
Copy link

coderabbitai bot commented Feb 13, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

本次变更引入端点探针与按厂商/类型的批量统计/日志接口,支持按 ID 批量读取并同步电路健康状态(含 Redis 批量加载与状态转移);增加视口首次进入懒加载钩子;服务器端新增 slim usage-log 与 quota 聚合;前端改为精确的 react-query 缓存失效与按需加载。

Changes

Cohort / File(s) Summary
配置
biome.json
更新 JSON schema 版本 2.3.14 → 2.3.15。
后端动作与路由
src/actions/my-usage.ts, src/actions/provider-endpoints.ts, src/app/api/actions/[...route]/route.ts
新增 admin 批量 actions (batchGetProviderEndpointProbeLogs, batchGetVendorTypeEndpointStats);my-usage 切换到 slim usage-log 查询并使用 quota 聚合接口;加强外键/缓存失效容错。
批量仓库与统计
src/repository/provider-endpoints-batch.ts, src/repository/usage-logs.ts, src/repository/statistics.ts, src/repository/provider.ts, src/repository/provider-endpoints.ts, src/repository/index.ts
新增批量仓库函数(probe logs / vendor-type stats)、findUsageLogsForKeySlim、distinct 缓存、quota-cost 聚合(sumUserQuotaCosts/sumKeyQuotaCostsById),以及导出/新增 findEnabledProviderEndpointsByVendorAndType 与 probe-log 写入防护。
电路断路器与 Redis 批量加载
src/lib/endpoint-circuit-breaker.ts, src/lib/redis/endpoint-circuit-breaker-state.ts, src/lib/provider-endpoints/endpoint-selector.ts
新增 getAllEndpointHealthStatusAsyncloadEndpointCircuitStates,实现 Redis 批量加载并合并内存状态;端点选择改为基于批量健康状态判断并使用已启用端点查询。
前端组件与按视口懒加载
src/lib/hooks/use-in-view-once.ts, src/app/[locale]/settings/providers/_components/...
新增 useInViewOnce 钩子;多处组件(EndpointLatencySparklineProviderEndpointHoverProviderEndpointsTableProviderRichListItem 等)改用批量 probe/logs 与批量电路状态,并支持 deferUntilInView 延迟加载,移除 router.refresh,改用精确 invalidateQueries。
路由与 API 暴露
src/app/api/actions/[...route]/route.ts
注册并暴露两个 admin-only 批量路由及 OpenAPI 描述:batchGetProviderEndpointProbeLogsbatchGetVendorTypeEndpointStats
Redis / Drizzle 模式 与 索引
drizzle/0067_fuzzy_quasar.sql, src/drizzle/schema.ts, drizzle/meta/_journal.json
新增针对 provider/providers 与 provider_endpoints 的部分索引以优化非删除行查询;添加迁移条目。
端点选择器与探针写入行为
src/lib/provider-endpoints/endpoint-selector.ts, src/repository/provider-endpoints.ts
端点可用性判断改为批量健康查询;新增查找已启用端点函数;recordProviderEndpointProbeResult 在更新确认存在后才写 probe log,避免 FK 问题。
测试
tests/unit/...
更新与新增大量单元测试与 mocks(替换 findUsageLogsWithDetailsfindUsageLogsForKeySlim、新增 batch mocks、getAllEndpointHealthStatusAsync mock、quota 聚合 mock),并新增 probe-result 单测以覆盖新行为。

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.79% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Merge Conflict Detection ⚠️ Warning ❌ Merge conflicts detected (46 files):

⚔️ biome.json (content)
⚔️ drizzle/meta/_journal.json (content)
⚔️ scripts/clear-session-bindings.ts (content)
⚔️ src/actions/my-usage.ts (content)
⚔️ src/actions/provider-endpoints.ts (content)
⚔️ src/actions/users.ts (content)
⚔️ src/app/[locale]/settings/providers/_components/add-provider-dialog.tsx (content)
⚔️ src/app/[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx (content)
⚔️ src/app/[locale]/settings/providers/_components/forms/provider-form/index.tsx (content)
⚔️ src/app/[locale]/settings/providers/_components/provider-endpoint-hover.tsx (content)
⚔️ src/app/[locale]/settings/providers/_components/provider-endpoints-table.tsx (content)
⚔️ src/app/[locale]/settings/providers/_components/provider-list.tsx (content)
⚔️ src/app/[locale]/settings/providers/_components/provider-rich-list-item.tsx (content)
⚔️ src/app/[locale]/settings/providers/_components/provider-vendor-view.tsx (content)
⚔️ src/app/[locale]/settings/providers/_components/vendor-keys-compact-list.tsx (content)
⚔️ src/app/v1/_lib/proxy/rate-limit-guard.ts (content)
⚔️ src/app/v1/_lib/proxy/response-handler.ts (content)
⚔️ src/app/v1/_lib/proxy/session-guard.ts (content)
⚔️ src/drizzle/schema.ts (content)
⚔️ src/lib/endpoint-circuit-breaker.ts (content)
⚔️ src/lib/provider-endpoints/endpoint-selector.ts (content)
⚔️ src/lib/proxy-agent/agent-pool.ts (content)
⚔️ src/lib/rate-limit/concurrent-session-limit.ts (content)
⚔️ src/lib/rate-limit/service.ts (content)
⚔️ src/lib/redis/endpoint-circuit-breaker-state.ts (content)
⚔️ src/lib/redis/lua-scripts.ts (content)
⚔️ src/lib/session-manager.ts (content)
⚔️ src/lib/session-tracker.ts (content)
⚔️ src/repository/index.ts (content)
⚔️ src/repository/provider-endpoints.ts (content)
⚔️ src/repository/provider.ts (content)
⚔️ src/repository/statistics.ts (content)
⚔️ src/repository/usage-logs.ts (content)
⚔️ tests/unit/actions/my-usage-concurrent-inherit.test.ts (content)
⚔️ tests/unit/actions/my-usage-date-range-dst.test.ts (content)
⚔️ tests/unit/actions/provider-endpoints.test.ts (content)
⚔️ tests/unit/actions/total-usage-semantics.test.ts (content)
⚔️ tests/unit/lib/endpoint-circuit-breaker.test.ts (content)
⚔️ tests/unit/lib/provider-endpoints/endpoint-selector.test.ts (content)
⚔️ tests/unit/lib/proxy-agent/agent-pool.test.ts (content)
⚔️ tests/unit/lib/rate-limit/concurrent-session-limit.test.ts (content)
⚔️ tests/unit/lib/rate-limit/service-extra.test.ts (content)
⚔️ tests/unit/lib/session-tracker-cleanup.test.ts (content)
⚔️ tests/unit/lib/session-ttl-validation.test.ts (content)
⚔️ tests/unit/proxy/rate-limit-guard.test.ts (content)
⚔️ tests/unit/settings/providers/provider-rich-list-item-endpoints.test.tsx (content)

These conflicts must be resolved before merging into dev.
Resolve conflicts locally and push changes to this branch.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed 标题为中文,概括性地反映了PR的主要改动方向(性能与稳定性优化),但未详细说明具体优化内容,属于部分相关的范畴。
Description check ✅ Passed PR描述详细说明了背景问题、主要改动、性能优化措施、兼容性处理和本地验证步骤,与changeset高度相关。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
⚔️ Resolve merge conflicts (beta)
  • Auto-commit resolved conflicts to branch issue/779-provider-performance
  • Post resolved changes as copyable diffs in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @tesgth032, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求旨在全面优化 Providers 功能的性能和稳定性。通过解决 N+1 查询问题、减少重复请求、引入批量数据加载机制以及细化前端渲染策略,显著提升了用户界面的响应速度和后端服务的负载能力。同时,对数据库查询进行了深度优化,确保了系统在数据规模增长时的可扩展性和健壮性,从而为用户提供更流畅、更可靠的 Providers 管理体验。

Highlights

  • Providers 列表性能优化: Providers 列表的端点计数现在使用 batchGetVendorTypeEndpointStats 进行批量统计,并且端点详情和熔断状态仅在 tooltip 打开时按需加载,显著减少了初始请求量。
  • 端点探针日志 Sparkline 优化: 端点表的 Sparkline(探针日志)现在仅在可视区域内触发加载,并采用 10ms 微批量合并请求。优先使用批量 HTTP API batchGetProviderEndpointProbeLogs,若遇 404 或异常则自动降级为单条 action(并发限制 ≤4),避免了请求风暴。
  • 熔断状态批量读取与同步: 熔断状态现在通过 Redis pipeline 批量读取,并使用 getAllEndpointHealthStatusAsync({ forceRefresh: true }) 同步内存状态。前端对批量请求进行 500 条分片,以避免超限。
  • 移除全局刷新与细化缓存失效: Providers 管理页面的主要操作移除了 router.refresh() 调用,缓存失效机制从全局收敛到 vendor 维度。Vendor 视图的端点池部分增加了 in-view 延迟挂载,进一步优化了页面加载性能。
  • 数据库查询性能提升: Provider 统计的 SQL 查询已优化,去除了潜在的笛卡尔积。my-usageusage-logs 的查询也减少了 JOIN 操作和往返次数,提升了数据检索效率。
Changelog
  • biome.json
    • 更新了 Biome 模式版本。
  • src/actions/my-usage.ts
    • 移除了 keysTable 导入。
    • findUsageLogsWithDetails 替换为更优化的 findUsageLogsForKeySlim
    • 删除了已废弃的 sumUserCost 函数。
    • 更新了 getMyQuota 以直接使用 sumUserTotalCostsumUserCostInTimeRange
    • 更新了 getMyUsageLogs 以使用 findUsageLogsForKeySlim
    • 优化了 getUserConcurrentSessions,直接使用 SessionTracker.getUserSessionCount 避免 N+1 查询。
  • src/actions/provider-endpoints.ts
    • 新增了 getAllEndpointHealthStatusAsync 导入。
    • 新增了批量仓库函数 findProviderEndpointProbeLogsBatchfindVendorTypeEndpointStatsBatch 的导入。
    • 定义了新的 Zod 模式用于批量请求 (BatchGetVendorTypeEndpointStatsSchema, BatchGetProviderEndpointProbeLogsBatchSchema)。
    • 新增了 isForeignKeyViolationError 辅助函数。
    • addProviderEndpoint, editProviderEndpoint, removeProviderEndpoint, editProviderVendor 后添加了 publishProviderCacheInvalidation 调用。
    • 实现了 batchGetProviderEndpointProbeLogsbatchGetVendorTypeEndpointStats 批量操作。
    • 更新了 batchGetEndpointCircuitInfo 以使用 getAllEndpointHealthStatusAsync 批量获取熔断状态。
  • src/app/[locale]/settings/providers/_components/add-provider-dialog.tsx
    • 移除了 useRouter 导入和 router.refresh() 调用。
    • 新增了 providers-statisticsprovider-vendors 查询的缓存失效逻辑。
  • src/app/[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx
    • 新增了 useInViewOnce Hook。
    • 引入了 ProbeLog 类型。
    • 实现了探针日志的批量获取逻辑(normalizeProbeLogsByEndpointId, tryFetchBatchProbeLogsByEndpointIds, fetchProbeLogsByEndpointIds, ProbeLogsBatcher)。
    • 当元素进入可视区域时才启用查询。
    • 更新了渲染以使用 useInViewOnceref
  • src/app/[locale]/settings/providers/_components/provider-endpoint-hover.tsx
    • 导入了 batchGetEndpointCircuitInfobatchGetVendorTypeEndpointStats
    • 实现了供应商类型端点统计的批量获取机制(requestVendorTypeEndpointStatsBatched, flushVendorTypeEndpointStats)。
    • 更新了 ProviderEndpointHover 以使用批量统计和熔断信息。
    • 移除了 EndpointRow 中对单个端点熔断信息的独立获取。
    • 新增了端点加载状态的显示。
  • src/app/[locale]/settings/providers/_components/provider-endpoints-table.tsx
    • 新增了 useInViewOnce Hook。
    • 更新了 batchGetEndpointCircuitInfo 以在内部处理批量分块(每批 500 个)。
    • 将查询缓存失效键从全局 provider-endpoints 更改为特定供应商的 provider-endpoints/[vendorId]
    • ProviderEndpointsSection 添加了 deferUntilInView 属性,以根据可见性条件渲染 ProviderEndpointsTable
  • src/app/[locale]/settings/providers/_components/provider-list.tsx
    • 新增了 useQuery 以获取 getProviderVendors 数据。
    • vendor 属性传递给 ProviderRichListItem
  • src/app/[locale]/settings/providers/_components/provider-rich-list-item.tsx
    • 移除了 useRoutergetProviderVendors 导入。
    • 接受了 vendor 属性。
    • 移除了各种操作(删除、重置熔断、重置用量、切换、编辑、组编辑、克隆)后的 router.refresh() 调用。
  • src/app/[locale]/settings/providers/_components/provider-vendor-view.tsx
    • deferUntilInView 属性应用于 ProviderEndpointsSection
  • src/app/[locale]/settings/providers/_components/vendor-keys-compact-list.tsx
    • 移除了 useRouter 导入和 router.refresh() 调用。
    • 新增了 providers-healthproviders-statistics 查询的缓存失效逻辑。
  • src/lib/endpoint-circuit-breaker.ts
    • 新增了 loadEndpointCircuitStates 导入。
    • 实现了 getAllEndpointHealthStatusAsync,用于从 Redis 批量获取和刷新熔断状态。
  • src/lib/hooks/use-in-view-once.ts
    • 新增了 useInViewOnce 文件,提供了用于延迟加载的 Hook。
  • src/lib/redis/endpoint-circuit-breaker-state.ts
    • 实现了 loadEndpointCircuitStates,通过 Redis pipeline 批量获取多个熔断器状态。
  • src/repository/provider-endpoints-batch.ts
    • 新增了 provider-endpoints-batch.ts 文件,包含了 findVendorTypeEndpointStatsBatchfindProviderEndpointProbeLogsBatch 批量数据库查询函数。
  • src/repository/provider.ts
    • getProviderStatistics 添加了内存缓存。
    • 优化了 getProviderStatistics 的 SQL 查询,避免了笛卡尔积并使用了时区感知的过滤。
  • src/repository/statistics.ts
    • 实现了 getKeyStringByIdCached,带有短生命周期的内存缓存,以减少 key 字符串的数据库查找。
    • 更新了 sumKeyTotalCostById, sumKeyCostInTimeRange, findKeyCostEntriesInTimeRange, getRateLimitEventStats 以使用 getKeyStringByIdCached
  • src/repository/usage-logs.ts
    • 引入了 findUsageLogsForKeySlim,用于更优化的用量日志查询。
    • getDistinctModelsForKeygetDistinctEndpointsForKey 添加了内存缓存。
  • tests/unit/actions/my-usage-date-range-dst.test.ts
    • 更新了测试以使用 findUsageLogsForKeySlim 的模拟。
  • tests/unit/actions/provider-endpoints.test.ts
    • 更新了测试以使用 getAllEndpointHealthStatusAsync 的模拟进行批量熔断信息获取。
  • tests/unit/actions/total-usage-semantics.test.ts
    • 更新了源代码验证的正则表达式。
  • tests/unit/settings/providers/provider-rich-list-item-endpoints.test.tsx
    • 在测试中为 ProviderRichListItem 添加了 vendor 属性。
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions github-actions bot added enhancement New feature or request area:UI area:provider size/L Large PR (< 1000 lines) labels Feb 13, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request primarily focuses on optimizing data fetching and rendering performance, particularly for provider endpoint information and usage logs. Key changes include refactoring src/actions/my-usage.ts to use a new findUsageLogsForKeySlim function and a more efficient getUserConcurrentSessions method, and removing a deprecated sumUserCost function. A new src/repository/provider-endpoints-batch.ts file was introduced to provide batch fetching capabilities for provider endpoint probe logs and vendor-type endpoint statistics, which are then utilized by new Server Actions in src/actions/provider-endpoints.ts. The batchGetEndpointCircuitInfo action was updated to use a new getAllEndpointHealthStatusAsync function for batch retrieval of circuit breaker states from Redis, improving efficiency. Client-side components like EndpointLatencySparkline, ProviderEndpointHover, and ProviderEndpointsTable were updated to leverage these new batching actions and introduce a useInViewOnce hook for deferred loading of data when elements enter the viewport, preventing request storms. Additionally, router.refresh() calls were replaced with more granular queryClient.invalidateQueries() calls across several UI components for better cache management. The getProviderStatistics function in src/repository/provider.ts was optimized with a new SQL query structure and an in-memory cache, and src/repository/statistics.ts introduced a cache for keyId to keyString lookups to reduce database queries. Review comments highlighted critical security vulnerabilities due to the insecure exposure of repository functions as public Server Actions in src/repository/provider-endpoints-batch.ts, src/actions/my-usage.ts (findUsageLogsForKeySlim), and src/repository/provider.ts (getProviderStatistics), which lack proper authentication and authorization. Other feedback included concerns about hardcoded API paths, inefficient useEffect dependencies, and potential cache-busting issues with the keyStringByIdCache implementation.

@@ -0,0 +1,163 @@
"use server";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

The use of the "use server" directive in repository files is highly insecure. This directive exports all functions in the file as public Server Actions, which can be called directly from the client. Since these repository functions (e.g., findVendorTypeEndpointStatsBatch, findProviderEndpointProbeLogsBatch) do not perform any authentication or authorization checks, they are exposed as public endpoints, allowing any user to access potentially sensitive data. Repository functions should be internal helper functions called only by authorized Server Actions located in the src/actions/ directory.

cacheTtlApplied: string | null;
}

export async function findUsageLogsForKeySlim(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

The function findUsageLogsForKeySlim is exported from a file with the "use server" directive, making it a public Server Action. It accepts a keyString parameter and returns usage logs for that key without verifying if the authenticated user is authorized to view them. This creates an Insecure Direct Object Reference (IDOR) vulnerability, where any user can access usage logs for any API key by calling this action directly. You should remove the "use server" directive from this file and ensure that usage logs are only accessible through authorized actions that verify ownership (e.g., getMyUsageLogs in src/actions/my-usage.ts).

data: ProviderStatisticsRow[];
} | null = null;

export async function getProviderStatistics(): Promise<ProviderStatisticsRow[]> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The function getProviderStatistics is exported from a file with the "use server" directive, making it a public Server Action. It returns statistics for all providers without any authorization checks. This allows any user to access potentially sensitive provider performance and cost data. Repository functions should not be exposed as Server Actions; instead, they should be called by authorized actions in the src/actions/ directory that perform role-based access control (e.g., checking for admin role).

const merged: Record<number, ProbeLog[]> = {};

for (const chunk of chunks) {
const res = await fetch("/api/actions/providers/batchGetProviderEndpointProbeLogs", {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

这里通过 fetch 调用了一个硬编码的路径 /api/actions/providers/batchGetProviderEndpointProbeLogs。在 Next.js 中,Server Actions 并没有这样固定的公开 URL。如果该路径对应的是一个尚未实现的 Route Handler,则此请求将始终返回 404 并触发降级逻辑。建议确认是否需要添加对应的 API 路由,或者直接调用导入的 Server Action 函数以利用 Next.js 的原生机制。


observer.observe(el);
return () => observer.disconnect();
}, [isInView, options]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

useEffect 的依赖项中包含了 options 对象。由于 options 通常是在组件调用处以对象字面量形式传递的(例如 useInViewOnce({ rootMargin: '100px' })),这会导致 Hook 在每次组件渲染时都重新创建观察者,从而产生不必要的性能开销。建议在 Hook 内部使用 useRef 来持久化配置,或者在依赖项中仅包含 options 的特定属性。

Comment on lines 60 to 62
if (keyStringByIdCache.size >= KEY_STRING_BY_ID_CACHE_MAX_SIZE) {
keyStringByIdCache.clear();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

当缓存达到 MAX_SIZE 且没有过期项时,直接执行 keyStringByIdCache.clear() 会导致缓存瞬间排空。在高并发场景下,这可能会引发缓存击穿,导致大量请求同时涌向数据库。建议采用更温和的淘汰策略,例如删除最早加入的 10% 数据,或者使用成熟的 LRU 算法库。

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/repository/provider.ts`:
- Around line 900-901: The comment containing an emoji should be cleaned: remove
the "⭐" character from the comment that explains using the last item of
providerChain (provider_chain) to determine the final provider and falling back
to provider_id when provider_chain is empty; keep the rest of the Chinese text
intact and ensure references to providerChain, provider_chain, and provider_id
remain unchanged.
- Around line 976-983: The code currently casts the db.execute() result directly
to ProviderStatisticsRow[] (using "const data = result as unknown as
ProviderStatisticsRow[]"), but db.execute() from postgres-js/drizzle-orm returns
an iterable and other call sites use Array.from(result); replace the direct type
assertion by converting the result with Array.from(result) and then type it as
ProviderStatisticsRow[] before storing it in providerStatisticsCache and
returning it so the handling matches the other db.execute() usages.
🧹 Nitpick comments (18)
src/app/[locale]/settings/providers/_components/vendor-keys-compact-list.tsx (1)

468-474: 编辑成功后失效 health 和 statistics 缓存,逻辑正确。

编辑可能变更 key、type 等核心字段,同时刷新 health 和 statistics 是合理的。

一个小的一致性提示:onSuccess(创建流程,Line 121-125)目前只失效了 "providers""provider-vendors",未包含 "providers-health" / "providers-statistics"。虽然新建的 provider 尚无健康检测/统计数据,影响有限,但如果后续需要在创建后立即显示健康状态,可能需要补齐。

src/lib/hooks/use-in-view-once.ts (1)

36-36: options 作为 effect 依赖会导致观察者在可见前反复重建。

如果调用方每次渲染传入新对象字面量(如 useInViewOnce({ rootMargin: "100px" })),由于引用不等,effect 会反复执行,在元素可见前不断创建/销毁 IntersectionObserver。

建议对 options 做序列化或让调用方自行 useMemo,或者在 hook 内部做浅比较缓存:

可选的稳定化方案
+import { useEffect, useRef, useState, useMemo } from "react";
+
 export function useInViewOnce<T extends Element>(options?: IntersectionObserverInit) {
   const ref = useRef<T | null>(null);
   const [isInView, setIsInView] = useState(false);
+  const stableOptions = useRef(options);
+  // shallow compare to avoid re-triggering
+  if (
+    options?.root !== stableOptions.current?.root ||
+    options?.rootMargin !== stableOptions.current?.rootMargin ||
+    options?.threshold !== stableOptions.current?.threshold
+  ) {
+    stableOptions.current = options;
+  }

   useEffect(() => {
     // ...
-  }, [isInView, options]);
+  }, [isInView, stableOptions.current]);
src/repository/usage-logs.ts (1)

284-301: UsageLogSlimFiltersUsageLogSlimRow 未导出,可能影响外部类型引用。

目前这两个接口只用 interface 声明,没有 export。如果有外部调用方需要引用过滤器或返回行的类型(例如在 mock 或测试中显式标注类型),将无法直接导入。

当前 my-usage.ts 传入内联对象、测试通过 mock 绕过,所以暂时无问题。后续如果需要在其他地方引用这些类型,建议加上 export

src/app/[locale]/settings/providers/_components/provider-endpoint-hover.tsx (2)

39-41: 模块级 deferred Map 在组件卸载后可能持有过期引用。

如果一个组件在 flushVendorTypeEndpointStats 执行前卸载,其 deferred promise 的 resolve/reject 仍留在 deferredByProviderTypeVendorId 中。虽然 resolve 一个已卸载组件的 promise 不会导致崩溃(React Query 会忽略已取消的查询),但 deferred 条目不会被清理,可能在极端快速挂载/卸载场景下累积。

当前的 setTimeout(fn, 0) 窗口极短,实际风险很低。仅作为防御性提醒。


260-267: 批量获取熔断状态时 res.data 可能为 undefined

res.oktrue 时,res.data 在类型层面可能仍是 T | undefined(取决于 ActionResult 的类型定义)。如果实际不可能出现 undefined,建议加 ?? [] 做防御:

-          const res = await batchGetEndpointCircuitInfo({ endpointIds: chunk });
-          return res.ok ? res.data : [];
+          const res = await batchGetEndpointCircuitInfo({ endpointIds: chunk });
+          return res.ok ? (res.data ?? []) : [];
src/actions/provider-endpoints.ts (1)

795-810: forceRefresh: true 每次触发 Redis 读取,注意高频场景。

batchGetEndpointCircuitInfo 使用 forceRefresh: true 绕过内存缓存直接读 Redis,适合管理页需要实时熔断状态的场景。但如果多个 tooltip 同时打开或短时间内反复打开/关闭,可能产生较多 Redis 请求。

当前 staleTime: 1000 * 10(10 秒)在 React Query 层提供了一定缓冲。如果后续发现 Redis 压力,可考虑将 forceRefresh 改为仅在首次加载或用户手动刷新时启用。

src/repository/provider-endpoints-batch.ts (2)

83-111: 原始 SQL 中硬编码了表名 provider_endpoint_probe_logs

第 106 行直接使用字符串表名而非 Drizzle schema 引用。如果表名在 schema 定义中变更,此处不会同步更新。建议通过 Drizzle 的 schema 对象获取表名,或至少添加注释说明此处与 schema 的对应关系。

根据编码规范,repository 层应使用 Drizzle ORM 进行数据访问。此处使用原始 SQL 实现 ROW_NUMBER() 窗口函数是合理的,但可以考虑引用 schema 中的表名以提高可维护性。

建议引用 schema 表名

在文件顶部导入 probe logs 表的 schema 定义(如果存在),然后通过 Drizzle 的 getTableName() 或类似机制引用表名:

+import { providerEndpointProbeLogs } from "@/drizzle/schema";
+import { getTableName } from "drizzle-orm";

然后在 SQL 中使用:

-      FROM provider_endpoint_probe_logs
+      FROM ${sql.identifier(getTableName(providerEndpointProbeLogs))}

如果 Drizzle 版本不支持 getTableName,可退而求其次添加注释:

// Table name must match: providerEndpointProbeLogs in `@/drizzle/schema`
FROM provider_endpoint_probe_logs

As per coding guidelines: "Use Drizzle ORM for data access in the repository layer".


9-13: toDate 静默回退到 new Date() 可能掩盖数据异常。

value 既不是 Datestring 也不是 number 时,返回当前时间。这意味着如果数据库返回了意外类型(例如 null),日志的 createdAt 会被静默替换为当前时间,而不是抛出或记录警告。

对于探测日志的 createdAt 字段,时间准确性较为重要。建议至少记录 debug 日志,或在调用侧对 null 值进行预处理。

src/lib/endpoint-circuit-breaker.ts (1)

149-161: Redis 状态同步仅在 circuitState 变化时更新内存。

Line 151 的条件 redisState.circuitState !== existingHealth.circuitState 意味着如果 circuitState 相同但 failureCount 不同(例如另一个实例累积了更多失败),内存中的 failureCount 不会从 Redis 同步。对于管理页面展示场景,这可能导致不同实例上显示的 failureCount 不一致。

当前实现与 getOrCreateHealth(line 62)保持了一致,且 forceRefresh 会先清除 loadedFromRedis 条目使得下次 needsRefresh 判断为 true,但由于 line 151 的条件限制,即使 forceRefresh 也不会更新 same-state 的 failureCount。

如果多实例场景下展示准确的 failureCount 是预期需求,建议放宽为:

-          if (!existingHealth || redisState.circuitState !== existingHealth.circuitState) {
+          if (!existingHealth
+            || redisState.circuitState !== existingHealth.circuitState
+            || redisState.failureCount !== existingHealth.failureCount) {

如果当前行为是有意的(减少不必要的对象分配),可以忽略此建议。

tests/unit/actions/provider-endpoints.test.ts (1)

13-14: 批量仓库 mock 已声明但未在任何测试中断言使用。

findProviderEndpointProbeLogsBatchMockfindVendorTypeEndpointStatsBatchMock 虽然作为模块 mock 注册是合理的(避免导入解析失败),但当前无任何测试用例验证其调用行为。如果这些批量仓库函数已在 action 层被使用,建议补充相应的测试覆盖。

#!/bin/bash
# 验证 findProviderEndpointProbeLogsBatch 和 findVendorTypeEndpointStatsBatch 在 action 层是否有实际调用
rg -n --type=ts 'findProviderEndpointProbeLogsBatch|findVendorTypeEndpointStatsBatch' -g '!**/test*' -g '!**/*.test.*'

Also applies to: 57-60

src/app/[locale]/settings/providers/_components/provider-endpoints-table.tsx (1)

141-158: 批量分片获取熔断状态逻辑正确,建议提取分片工具函数。

500 条为单位的分片 + Promise.all 并行拉取模式在本文件和 endpoint-latency-sparkline.tsx 中均有出现(相同的 MAX_ENDPOINT_IDS_PER_BATCH 和 chunking 循环)。可以考虑提取为通用工具函数以减少重复。

♻️ 可选:提取通用分片工具
// e.g. in `@/lib/utils/chunk.ts`
export function chunkArray<T>(arr: T[], size: number): T[][] {
  const chunks: T[][] = [];
  for (let i = 0; i < arr.length; i += size) {
    chunks.push(arr.slice(i, i + size));
  }
  return chunks;
}

然后在两处分别引用:

-      const MAX_ENDPOINT_IDS_PER_BATCH = 500;
-      const chunks: number[][] = [];
-      for (let index = 0; index < endpointIds.length; index += MAX_ENDPOINT_IDS_PER_BATCH) {
-        chunks.push(endpointIds.slice(index, index + MAX_ENDPOINT_IDS_PER_BATCH));
-      }
+      const chunks = chunkArray(endpointIds, 500);
src/app/[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx (4)

48-91: normalizeProbeLogsByEndpointId 中对 logs 数组元素未做运行时校验。

Line 58、72、85 处均将 logs 直接断言为 ProbeLog[],但未验证数组内部元素是否确实包含 oklatencyMs 字段。如果后端返回了非预期结构,下游 .map() 会产生 undefined 值而非报错,导致 sparkline 渲染异常但不会抛出明显错误(静默失败)。

考虑到这是性能优化代码且后端格式相对可控,这属于防御性编程建议,可以后续再做。


93-142: isBatchProbeLogsEndpointAvailable 状态一旦为 false 则整个页面生命周期内永不恢复。

当遇到 404 后,该模块级变量被置为 false,后续所有批量请求均跳过,直到用户刷新页面。这在前后端版本切换场景下是合理的降级策略(PR 描述中也有说明),但如果用户长时间不刷新页面而后端已升级,将持续使用低效的逐条请求模式。

如果需要改进,可以考虑加一个时间窗口(如 5 分钟后重试一次 batch):

♻️ 可选:添加重试时间窗口
-let isBatchProbeLogsEndpointAvailable: boolean | undefined;
+let isBatchProbeLogsEndpointAvailable: boolean | undefined;
+let batchProbeLogsDisabledAt: number | undefined;
+const BATCH_RETRY_INTERVAL_MS = 5 * 60 * 1000;

 async function tryFetchBatchProbeLogsByEndpointIds(
   endpointIds: number[],
   limit: number
 ): Promise<Record<number, ProbeLog[]> | null> {
   if (endpointIds.length <= 1) return null;
-  if (isBatchProbeLogsEndpointAvailable === false) return null;
+  if (isBatchProbeLogsEndpointAvailable === false) {
+    if (batchProbeLogsDisabledAt && Date.now() - batchProbeLogsDisabledAt < BATCH_RETRY_INTERVAL_MS) {
+      return null;
+    }
+    isBatchProbeLogsEndpointAvailable = undefined;
+  }
   if (process.env.NODE_ENV === "test") return null;

在 404 分支也记录时间:

     if (res.status === 404) {
       isBatchProbeLogsEndpointAvailable = false;
+      batchProbeLogsDisabledAt = Date.now();
       return null;
     }

112-135: 批量分片采用串行处理,与 provider-endpoints-table.tsx 中的并行策略不一致。

此处 for (const chunk of chunks) 是串行的,而 provider-endpoints-table.tsx 中熔断状态的分片使用 Promise.all(chunks.map(...)) 并行。对于 probe logs 数据量较大的场景,串行可能是有意为之(减少服务端压力),但值得在注释中说明理由,避免后续维护者困惑。

此外,Line 130 处如果某个中间 chunk 的 normalized 返回 null,前面已成功处理的 chunk 结果会被丢弃,整体降级为逐条请求(这些端点会被重复拉取)。影响不大但值得注意。


183-227: ProbeLogsBatcher 微批量合并设计合理,有一个小的防御性建议。

10ms 的 setTimeout 窗口 + 按 limit 分组 + snapshot-then-clear 的刷新策略都很好。不过 Line 199 的 void this.flush() 如果 flush()Promise.all 外层发生非预期异常(虽然极不可能),会导致 unhandled promise rejection。可考虑加一个 .catch

♻️ 可选:添加顶层 catch 防护
-      this.flushTimer = setTimeout(() => {
-        this.flushTimer = null;
-        void this.flush();
-      }, delayMs);
+      this.flushTimer = setTimeout(() => {
+        this.flushTimer = null;
+        this.flush().catch(() => {});
+      }, delayMs);
src/repository/provider.ts (3)

870-876: 考虑导出 ProviderStatisticsRow 类型

getProviderStatistics 是导出函数,但其返回类型 ProviderStatisticsRow 未导出。调用方无法按名称引用该类型,只能通过 ReturnType<> 等方式推断,不利于类型复用。

建议修改
-type ProviderStatisticsRow = {
+export type ProviderStatisticsRow = {
   id: number;
   today_cost: string;
   today_calls: number;
   last_call_time: Date | null;
   last_call_model: string | null;
 };

886-898: 缓存对并发请求无去重(thundering herd)

当缓存过期瞬间,若多个并发请求同时到达 getProviderStatistics,它们都会通过缓存检查(expiresAt > now 为 false),各自发起独立的 DB 查询。虽然 10 秒 TTL 下概率较低,但在高频轮询场景下仍可能导致短暂的查询峰值。

如果未来观察到此类峰值,可考虑使用 Promise 级别的去重(即将 in-flight 查询的 Promise 缓存,后续请求复用同一 Promise),而非仅缓存结果数据。当前实现在现有规模下应该是可接受的。


903-908: bounds CTE 中 AT TIME ZONE 转换逻辑正确,但重复三次可简化

DATE_TRUNC('day', CURRENT_TIMESTAMP AT TIME ZONE ${timezone}) AT TIME ZONE ${timezone} 模式重复了三次(today_starttomorrow_startlast7_start)。可以先定义 today_start,然后用算术表达式派生其余两个,减少重复:

简化建议
       WITH bounds AS (
         SELECT
           (DATE_TRUNC('day', CURRENT_TIMESTAMP AT TIME ZONE ${timezone}) AT TIME ZONE ${timezone}) AS today_start,
-          (DATE_TRUNC('day', CURRENT_TIMESTAMP AT TIME ZONE ${timezone}) AT TIME ZONE ${timezone}) + INTERVAL '1 day' AS tomorrow_start,
-          (DATE_TRUNC('day', CURRENT_TIMESTAMP AT TIME ZONE ${timezone}) AT TIME ZONE ${timezone}) - INTERVAL '7 days' AS last7_start
+          (DATE_TRUNC('day', CURRENT_TIMESTAMP AT TIME ZONE ${timezone}) AT TIME ZONE ${timezone}) + INTERVAL '1 day' AS tomorrow_start
+      ),
+      bounds_ext AS (
+        SELECT *, today_start - INTERVAL '7 days' AS last7_start FROM bounds
       ),

或者保持单 CTE,利用 PostgreSQL 的列引用能力(需注意 CTE 中不能直接引用同级列)。当前写法在功能上没有问题,仅是可读性建议。

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This is a well-structured performance optimization PR that addresses N+1 query patterns and request storms in the Providers management UI. The changes introduce batch APIs, micro-batching for probe logs, and lazy loading via intersection observers.

PR Size: L

  • Lines changed: 1,840 (1,556 additions, 284 deletions)
  • Files changed: 23

Note: As a Large PR, consider whether this could be split into smaller, more focused changes for easier review:

  1. Backend batch APIs (repository + actions)
  2. Frontend batching/deferred loading hooks
  3. UI component updates

Issues Found

Category Critical High Medium Low
Logic/Bugs 0 0 0 0
Security 0 0 0 0
Error Handling 0 0 0 0
Types 0 0 0 0
Comments/Docs 0 0 0 0
Tests 0 0 0 0
Simplification 0 0 0 0

Review Coverage

  • Logic and correctness - Clean
  • Security (OWASP Top 10) - Clean
  • Error handling - Clean (graceful fallbacks are intentional)
  • Type safety - Clean
  • Documentation accuracy - Clean
  • Test coverage - Adequate (tests updated for new batch functions)
  • Code clarity - Good

Notable Design Decisions

  1. Micro-batching pattern (ProbeLogsBatcher, requestVendorTypeEndpointStatsBatched): Uses module-level state with setTimeout debouncing to coalesce rapid requests. This is a valid pattern for reducing request storms.

  2. Concurrent worker pattern in fetchProbeLogsByEndpointIds: The idx increment is synchronous before any await, making it safe in JavaScript's single-threaded execution model.

  3. Graceful degradation: Empty catch blocks in fallback paths intentionally suppress errors to allow fallback to individual API calls when batch endpoints are unavailable (e.g., during rolling deployments).

  4. In-memory caches: TTL-based caches in statistics.ts and usage-logs.ts reduce DB load for frequently accessed data. Cache invalidation is handled via TTL expiration.


Automated review by Claude AI

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

23 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +266 to +270
const delayMs = process.env.NODE_ENV === "test" ? 0 : 10;
this.flushTimer = setTimeout(() => {
this.flushTimer = null;
void this.flush().catch(() => {});
}, delayMs);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10ms micro-batch window effectively coalesces rapid mount events (scrolling through list) with no perceptible user delay

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx
Line: 266:270

Comment:
10ms micro-batch window effectively coalesces rapid mount events (scrolling through list) with no perceptible user delay

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +90 to +105
let errorCount = 0;
const errorSamples: Array<{ endpointId: number; error: string }> = [];

for (let index = 0; index < batchIds.length; index++) {
const endpointId = batchIds[index];
const [error, data] = results[index] ?? [];

if (error) {
errorCount += 1;
if (errorSamples.length < 3) {
errorSamples.push({
endpointId,
error: error instanceof Error ? error.message : String(error),
});
}
continue;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well-designed error tolerance pattern - samples first 3 errors to avoid log spam while batch failures don't abort entire operation

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/redis/endpoint-circuit-breaker-state.ts
Line: 90:105

Comment:
well-designed error tolerance pattern - samples first 3 errors to avoid log spam while batch failures don't abort entire operation

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +148 to +153
// Defensive: ensure per-endpoint limit, even if SQL changes or driver behavior differs.
for (const [endpointId, logs] of map) {
if (logs.length > limitPerEndpoint) {
map.set(endpointId, logs.slice(0, limitPerEndpoint));
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defensive slicing ensures per-endpoint limit even if SQL windowing behavior differs - good safety net

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider-endpoints-batch.ts
Line: 148:153

Comment:
defensive slicing ensures per-endpoint limit even if SQL windowing behavior differs - good safety net

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In
`@src/app/`[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx:
- Around line 48-91: normalizeProbeLogsByEndpointId 未验证数组内元素结构就把数组断言为
ProbeLog[](在 Array.isArray(data) 分支、logsByEndpointId 分支和 items
分支),会把畸形数据传到下游。请在这三个分支里对 logs 数组做最小结构校验:确保数组且至少检查第一个或每个元素包含必须字段(如 ok,
latencyMs/timestamp 等),用 filter 过滤掉不符合结构的条目并只在通过校验后才赋值到
map(或将无效/缺失字段的条目替换为带默认值的安全对象),这样 downstream 的 SparkPoint 映射就不会收到 undefined 字段。

In `@src/lib/hooks/use-in-view-once.ts`:
- Around line 77-97: The effect in use-in-view-once.ts can return early when
ref.current is null (delayed mount) and never re-run because dependencies omit
the element; update the logic so the effect depends on the actual element (or
use a callback ref) to ensure the observer is created when the element mounts:
either include ref.current (or a local state like observedEl set by a ref
callback) in the effect dependency array and use that element for
observer.observe, or refactor the hook to expose/accept a callback ref that
assigns the element to state and triggers the IntersectionObserver creation;
make sure to still guard for test/IntersectionObserver absence, call
setIsInView(true) when appropriate, and disconnect the observer in the cleanup.
🧹 Nitpick comments (7)
src/repository/provider.ts (2)

988-992: 缓存 expiresAt 使用了查询前捕获的 now

now 在第 897 行捕获(查询执行之前),但在第 990 行用于计算缓存过期时间。如果查询耗时较长(例如慢查询场景),缓存的有效 TTL 会被缩短。建议在写入缓存时使用新的时间戳:

建议修改
       providerStatisticsCache = {
         timezone,
-        expiresAt: now + PROVIDER_STATISTICS_CACHE_TTL_MS,
+        expiresAt: Date.now() + PROVIDER_STATISTICS_CACHE_TTL_MS,
         data,
       };

920-962: final_provider_id 的 CASE 表达式在两个 CTE 中重复

provider_statslatest_call 中的 CASE WHEN provider_chain IS NULL OR provider_chain = '[]'::jsonb THEN provider_id ELSE (provider_chain->-1->>'id')::int END 完全相同。可考虑提取为一个公共 CTE(如 resolved_requests),在其中计算 final_provider_id 后供后续 CTE 引用,减少维护时两处逻辑不一致的风险。

src/app/[locale]/settings/providers/_components/vendor-keys-compact-list.tsx (1)

121-127: 查询失效模式合理,可考虑抽取复用。

新增的 providers-healthproviders-statistics 失效逻辑在 create / delete / edit / toggle 四处重复出现(toggle 合理地省略了 statistics)。当前规模可接受,但若后续再增加 query key,维护成本会上升。可考虑抽取一个 invalidateProviderQueries(scope) 辅助函数统一管理。

示例:抽取失效辅助函数
// 在组件文件顶部或独立 util 中
function invalidateProviderQueries(
  queryClient: ReturnType<typeof useQueryClient>,
  scope: "full" | "toggle" = "full",
) {
  queryClient.invalidateQueries({ queryKey: ["providers"] });
  queryClient.invalidateQueries({ queryKey: ["providers-health"] });
  queryClient.invalidateQueries({ queryKey: ["provider-vendors"] });
  if (scope === "full") {
    queryClient.invalidateQueries({ queryKey: ["providers-statistics"] });
  }
}
src/lib/hooks/use-in-view-once.ts (1)

53-64: 在渲染期间直接读写 ref —— React 19 严格模式下存在隐患。

useStableIntersectionObserverOptions 在渲染阶段读写 stableOptionsRef.current(第 59-60 行)。React 19 严格模式下要求渲染函数为纯函数,渲染期间的 ref 读写虽然在实践中通常可行,但已被 React 官方文档标记为不推荐。如果 Strict Mode 的 double-render 导致比较在两次渲染间产生不同结果,可能出现细微问题。

可选方案:通过 useRef + useEffect 组合延迟更新,或使用 useMemo 配合序列化 key 来保持稳定引用。当前实际使用场景(options 几乎不变)下风险极低,标记为可选改进。

src/app/[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx (3)

93-93: 模块级可变状态 isBatchProbeLogsEndpointAvailable 不可恢复。

404 触发后 isBatchProbeLogsEndpointAvailable 被永久置为 false(直到页面刷新)。如果后端临时返回 404(部署间隙、蓝绿切换等),用户需要刷新页面才能恢复批量路径。可考虑加一个带 TTL 的重置机制,比如 5 分钟后重新尝试批量端点。

添加 TTL 重置的示例
-let isBatchProbeLogsEndpointAvailable: boolean | undefined;
+let isBatchProbeLogsEndpointAvailable: boolean | undefined;
+let batchDisabledAt: number | undefined;
+const BATCH_RETRY_INTERVAL_MS = 5 * 60 * 1000;
+
+function isBatchDisabled(): boolean {
+  if (isBatchProbeLogsEndpointAvailable === false) {
+    if (batchDisabledAt && Date.now() - batchDisabledAt > BATCH_RETRY_INTERVAL_MS) {
+      isBatchProbeLogsEndpointAvailable = undefined;
+      batchDisabledAt = undefined;
+      return false;
+    }
+    return true;
+  }
+  return false;
+}

然后在 404 处理中记录时间:

       if (res.status === 404) {
         isBatchProbeLogsEndpointAvailable = false;
+        batchDisabledAt = Date.now();

95-206: 约 250 行的批量数据获取逻辑应考虑抽取到独立模块。

tryFetchBatchProbeLogsByEndpointIdsfetchProbeLogsByEndpointIdsIndividuallyfetchProbeLogsByEndpointIdsnormalizeProbeLogsByEndpointId 以及 ProbeLogsBatcher 类构成了完整的数据获取层,与 UI 组件(sparkline 渲染)职责不同。将其抽取到如 @/lib/probe-logs-batcher.ts 有助于:

  • 独立测试批量/降级/合并逻辑
  • 在其他组件中复用 batcher
  • 降低当前文件的认知负荷

Also applies to: 208-236, 238-245, 247-307


216-232: 并发 worker 模式正确但依赖 JS 单线程语义——值得加注释说明。

idx 变量在多个 async worker 间共享,读取和自增发生在同一同步代码段(await 之前),因此在 JS 事件循环模型下是安全的。但这一模式不够直观,建议加一行注释说明其安全性依赖。

@tesgth032
Copy link
Contributor Author

跟进 CodeRabbit review 做了一轮小幅补齐(不改变现有 UX),主要是把一些 corner case 和一致性问题收敛掉:

  • Providers 统计:移除 src/repository/provider.ts 中的 emoji 注释;db.execute() 结果改为 Array.from() 后再类型化;并增加 in-flight 去重,降低缓存过期瞬间的并发峰值。
  • useInViewOnce:对 options 做语义级稳定化,避免调用方每次 render 传入新对象导致 IntersectionObserver 反复重建。
  • EndpointLatencySparkline:定时触发 flush() 增加顶层 catch;batch 分片失败改为 best-effort(保留已成功 chunk,仅对失败/缺失的 endpointId 做单点回退),减少重复拉取与请求风暴放大。
  • 创建 Provider Key:补齐失效 providers-health / providers-statistics(与编辑流程一致,避免同页缓存不一致)。

本地已验证:npm run lintnpm run typechecknpm run testnpm run build 均通过;CI checks 也已全绿。

@tesgth032
Copy link
Contributor Author

跟进 CodeRabbit 最新两条 actionable,并顺手修正一个小的缓存 TTL 细节(不改变现有 UX/接口语义):

  • EndpointLatencySparklinenormalizeProbeLogsByEndpointId() 对 logs 数组做最小结构校验(过滤畸形条目),并在单条 fallback 路径复用同一归一化逻辑,避免下游 sparkline 处理到缺失字段导致异常。
  • useInViewOnce:改为 callback ref + element state,使得当元素延迟挂载时 effect 会在元素出现后正确创建 observer(test/无 IntersectionObserver 的 fast-path 行为保持不变)。
  • getProviderStatistics:写入缓存时 expiresAt 使用 Date.now(),避免慢查询场景下缓存有效期被缩短。

本地与 CI 均已通过(build/lint/typecheck/test 全绿),CodeRabbit 当前已回到 Approved。

@tesgth032
Copy link
Contributor Author

补齐一轮 #779 相关的“热路径性能 + 错误性质 bug 收敛 + 安全收敛”(尽量不改变现有 UX/接口语义):

  • 运行时端点选择:getPreferredProviderEndpoints / getEndpointFilterStats 改为一次性批量读取端点熔断状态(getAllEndpointHealthStatusAsync),避免逐端点 Redis 往返放大。
  • probe 写入容错:recordProviderEndpointProbeResult 先更新 endpoint snapshot;若端点在探测过程中被并发删除则静默忽略(不再因为 FK 失败导致 probe scheduler/手动探测异常外溢)。
  • 端点变更一致性:编辑/删除/手动测活成功后 best-effort resetEndpointCircuit,减少“测活成功但仍被熔断”导致的困惑。
  • 索引迁移:新增 idx_provider_endpoints_pick_enabled / idx_providers_vendor_type_url_active(仅新增 index,无表结构变更),Docker 默认 AUTO_MIGRATE=true 升级无感。
  • 安全收敛:src/repository/provider-endpoints-batch.ts 移除顶层 "use server",改为 server-only(避免误暴露为 Server Action)。

本地已验证:bun run lint(仅既有 warnings)、bun run lint:fixbun run typecheckbun run testbun run buildbun run db:generate

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/app/[locale]/settings/providers/_components/forms/provider-form/index.tsx (1)

370-374: ⚠️ Potential issue | 🟠 Major

创建成功仅失效 provider-vendors 缓存符合 PR 目标,但编辑路径存在缺陷:编辑操作无形式级别的缓存失效,完全依赖可选的 onSuccess 回调,若回调执行失败(第 378-380 行仅记录错误)将导致缓存得不到刷新。

建议改进:要么为编辑路径添加与创建路径一致的形式级别失效(如 queryClient.invalidateQueries({ queryKey: ["providers"] })),要么将 onSuccess 改为必需参数,确保缓存刷新的可靠性。

tests/unit/actions/total-usage-semantics.test.ts (1)

48-52: ⚠️ Potential issue | 🟡 Minor

缺少 SessionTracker.getUserSessionCount 的 mock

getMyQuota 现在调用 getUserConcurrentSessions(userId)SessionTracker.getUserSessionCount(userId),但此处 SessionTracker mock 仅提供 getKeySessionCount,没有 getUserSessionCount。虽然 getUserConcurrentSessions 内部有 try-catch 兜底返回 0,测试不会报错,但这属于"静默吞错"而非正确的 mock 行为。

🛡️ 建议补充 mock
 vi.mock("@/lib/session-tracker", () => ({
   SessionTracker: {
     getKeySessionCount: (...args: unknown[]) => getKeySessionCountMock(...args),
+    getUserSessionCount: vi.fn().mockResolvedValue(0),
   },
 }));
🤖 Fix all issues with AI agents
In `@src/actions/provider-endpoints.ts`:
- Around line 567-583: The code calls resetEndpointCircuitState again after a
successful probe, creating redundant Redis round-trips because
probeProviderEndpointAndRecordByEndpoint already calls resetEndpointCircuit when
appropriate; remove the entire conditional block that checks result.ok and tries
to call resetEndpointCircuitState (the try/catch logging calling
resetEndpointCircuitState with endpoint.id) from
src/actions/provider-endpoints.ts so the single reset path inside
probeProviderEndpointAndRecordByEndpoint (which invokes resetEndpointCircuit) is
the only place handling circuit resets.

In `@src/repository/statistics.ts`:
- Around line 984-1019: The outer WHERE uses scanEnd as an upper bound which can
truncate the costTotal aggregate (filtered only by createdAt >= cutoffDate); fix
by computing scanEnd to include the current time so it cannot be earlier than
now — i.e. set scanEnd = new Date(Math.max(..., Date.now())) when building
scanEnd (the same pattern should be applied in the analogous
sumKeyQuotaCostsById logic); update uses of scanEnd (and any other range
end-time max calculations) so costTotal (and related totals) are not
artificially upper-bounded by stale range endTimes.
🧹 Nitpick comments (8)
src/repository/usage-logs.ts (3)

380-417: COUNT 和数据查询可以并行执行

findUsageLogsForKeySlim 中 count 查询(line 380)和分页数据查询(line 388)是顺序执行的,但两者相互独立,可以用 Promise.all 并行化以减少延迟。

♻️ 建议使用 Promise.all 并行查询
- const [countResult] = await db
-   .select({ totalRows: sql<number>`count(*)::double precision` })
-   .from(messageRequest)
-   .where(and(...conditions));
-
- const total = countResult?.totalRows ?? 0;
-
- const offset = (safePage - 1) * safePageSize;
- const results = await db
-   .select({
-     id: messageRequest.id,
-     ...
-   })
-   .from(messageRequest)
-   .where(and(...conditions))
-   .orderBy(desc(messageRequest.createdAt), desc(messageRequest.id))
-   .limit(safePageSize)
-   .offset(offset);
+ const offset = (safePage - 1) * safePageSize;
+ const [countResult, results] = await Promise.all([
+   db
+     .select({ totalRows: sql<number>`count(*)::double precision` })
+     .from(messageRequest)
+     .where(and(...conditions)),
+   db
+     .select({
+       id: messageRequest.id,
+       // ... 其余字段
+     })
+     .from(messageRequest)
+     .where(and(...conditions))
+     .orderBy(desc(messageRequest.createdAt), desc(messageRequest.id))
+     .limit(safePageSize)
+     .offset(offset),
+ ]);
+
+ const total = countResult?.[0]?.totalRows ?? 0;

420-444: 缓存写入时 expiresAt 使用调用方传入的 now 而非当前时间

根据 PR 评论中的讨论,写入缓存时 expiresAt 应使用 Date.now() 而非调用方在查询前捕获的 now,以避免慢查询场景下缓存有效期被缩短。当前实现中 setDistinctKeyOptionsCache 直接使用调用方传入的 now(在 DB 查询之前捕获),可能导致有效 TTL 减少。

♻️ 建议在 setDistinctKeyOptionsCache 内部使用 Date.now()
 function setDistinctKeyOptionsCache(
   cache: Map<string, { data: string[]; expiresAt: number }>,
   key: string,
   data: string[],
-  now: number
 ): void {
+  const now = Date.now();
   if (cache.size >= DISTINCT_KEY_OPTIONS_CACHE_MAX_SIZE) {
     for (const [k, v] of cache) {
       if (v.expiresAt <= now) {
         cache.delete(k);
       }
     }
 
     if (cache.size >= DISTINCT_KEY_OPTIONS_CACHE_MAX_SIZE) {
       cache.clear();
     }
   }
 
   cache.set(key, { data, expiresAt: now + DISTINCT_KEY_OPTIONS_CACHE_TTL_MS });
 }

284-301: UsageLogSlimFiltersUsageLogSlimRow 未导出

findUsageLogsForKeySlim 是导出函数,但其参数类型 UsageLogSlimFilters 和返回值中的 UsageLogSlimRow 没有导出。虽然 TypeScript 结构化类型允许调用方传入字面量对象,但在外部代码需要引用这些类型时(如在测试 mock 中构造类型安全的参数)会受限。AI 摘要也提到这些是新增的公共类型。

♻️ 建议导出这两个接口
-interface UsageLogSlimFilters {
+export interface UsageLogSlimFilters {
   keyString: string;
   ...
 }

-interface UsageLogSlimRow {
+export interface UsageLogSlimRow {
   id: number;
   ...
 }

Also applies to: 303-319

src/actions/my-usage.ts (1)

600-648: keyBreakdownuserBreakdown 查询字段不一致

keyBreakdown 查询包含 cacheCreation5mTokenscacheCreation1hTokens(line 611-612),但 userBreakdown 查询没有这两个字段(line 627-636)。虽然这两个字段仅用于 summary 级别的统计(从 keyBreakdown reduce 而来)、不影响 userModelBreakdown 的映射逻辑,但两个并行查询的 select 列表不一致容易引起混淆,且如果后续需要在 userModelBreakdown 中也展示这些字段则需要再改。

src/lib/provider-endpoints/endpoint-selector.ts (2)

41-45: findEnabledProviderEndpointsByVendorAndType 已在 DB 层过滤 isEnableddeletedAt,应用层过滤冗余

findEnabledProviderEndpointsByVendorAndType 的查询条件已包含 is_enabled = true AND deleted_at IS NULL(见 src/repository/provider-endpoints.ts line 823-828),line 45 的 e.isEnabled && !e.deletedAt 检查重复。仅 excludeSet.has(e.id) 过滤是必要的。

♻️ 简化过滤逻辑
- const filtered = endpoints.filter((e) => e.isEnabled && !e.deletedAt && !excludeSet.has(e.id));
+ const filtered = excludeSet.size > 0
+   ? endpoints.filter((e) => !excludeSet.has(e.id))
+   : endpoints;

81-98: getEndpointFilterStats 中 enabled 端点的过滤执行了两次

Line 83 和 line 90 对同一数组执行了相同的 e.isEnabled && !e.deletedAt 过滤。可以复用一次过滤的结果。

♻️ 消除重复过滤
   const endpoints = await findProviderEndpointsByVendorAndType(input.vendorId, input.providerType);
   const total = endpoints.length;
-  const enabled = endpoints.filter((e) => e.isEnabled && !e.deletedAt).length;
+  const enabledEndpoints = endpoints.filter((e) => e.isEnabled && !e.deletedAt);
+  const enabled = enabledEndpoints.length;
 
   // When endpoint circuit breaker is disabled, no endpoints can be circuit-open
   if (!getEnvConfig().ENABLE_ENDPOINT_CIRCUIT_BREAKER) {
     return { total, enabled, circuitOpen: 0, available: enabled };
   }
 
-  const enabledEndpoints = endpoints.filter((e) => e.isEnabled && !e.deletedAt);
   if (enabledEndpoints.length === 0) {
     return { total, enabled: 0, circuitOpen: 0, available: 0 };
   }
src/repository/provider-endpoints.ts (1)

793-833: 建议复用 providerEndpointSelectFields 常量减少重复。

findEnabledProviderEndpointsByVendorAndType.select(...) 字段与 findProviderEndpointsByVendorAndType(第 762-779 行)以及第 209 行定义的 providerEndpointSelectFields 完全相同。直接复用该常量可减少维护负担。

♻️ 建议的改动
   const rows = await db
-    .select({
-      id: providerEndpoints.id,
-      vendorId: providerEndpoints.vendorId,
-      providerType: providerEndpoints.providerType,
-      url: providerEndpoints.url,
-      label: providerEndpoints.label,
-      sortOrder: providerEndpoints.sortOrder,
-      isEnabled: providerEndpoints.isEnabled,
-      lastProbedAt: providerEndpoints.lastProbedAt,
-      lastProbeOk: providerEndpoints.lastProbeOk,
-      lastProbeStatusCode: providerEndpoints.lastProbeStatusCode,
-      lastProbeLatencyMs: providerEndpoints.lastProbeLatencyMs,
-      lastProbeErrorType: providerEndpoints.lastProbeErrorType,
-      lastProbeErrorMessage: providerEndpoints.lastProbeErrorMessage,
-      createdAt: providerEndpoints.createdAt,
-      updatedAt: providerEndpoints.updatedAt,
-      deletedAt: providerEndpoints.deletedAt,
-    })
+    .select(providerEndpointSelectFields)
     .from(providerEndpoints)
tests/unit/lib/provider-endpoints/endpoint-selector.test.ts (1)

108-131: 健康状态 mock 类型定义在多个测试块中重复。

getAllStatusMock 的返回值类型(包含 failureCount, circuitState 等字段的 Record)在第 108-131、206-229、291-314 行重复了三次。可以考虑将类型提取为文件级别的 type alias 或抽取一个 helper 函数来构造 mock,减少重复。不过对测试代码来说可读性优先,这只是一个可选优化。

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

34 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines 407 to 409
const shouldResetCircuit =
parsed.data.url !== undefined ||
(parsed.data.isEnabled === true && previous.isEnabled === false);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Circuit reset triggers on unchanged URL

shouldResetCircuit checks parsed.data.url !== undefined, but the edit form (EditEndpointDialog) always sends the url field — even when only label or sortOrder was changed. This means every edit will reset the circuit breaker, regardless of what actually changed.

Since this is best-effort and has no functional impact beyond an extra Redis write, this is low severity. However, comparing against the previous value would be more precise:

Suggested change
const shouldResetCircuit =
parsed.data.url !== undefined ||
(parsed.data.isEnabled === true && previous.isEnabled === false);
const shouldResetCircuit =
(parsed.data.url !== undefined && parsed.data.url !== previous.url) ||
(parsed.data.isEnabled === true && previous.isEnabled === false);
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/actions/provider-endpoints.ts
Line: 407:409

Comment:
**Circuit reset triggers on unchanged URL**

`shouldResetCircuit` checks `parsed.data.url !== undefined`, but the edit form (`EditEndpointDialog`) always sends the `url` field — even when only `label` or `sortOrder` was changed. This means every edit will reset the circuit breaker, regardless of what actually changed.

Since this is best-effort and has no functional impact beyond an extra Redis write, this is low severity. However, comparing against the previous value would be more precise:

```suggestion
    const shouldResetCircuit =
      (parsed.data.url !== undefined && parsed.data.url !== previous.url) ||
      (parsed.data.isEnabled === true && previous.isEnabled === false);
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

34 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

const safePage = page > 0 ? page : 1;
const safePageSize = Math.min(100, Math.max(1, pageSize));

const conditions = [isNull(messageRequest.deletedAt), eq(messageRequest.key, keyString)];
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing EXCLUDE_WARMUP_CONDITION in slim query

findUsageLogsForKeySlim does not include EXCLUDE_WARMUP_CONDITION in its conditions array, unlike the original findUsageLogsWithDetails which applies it. This means warmup requests will now be counted in usage logs results for users, potentially showing inflated request counts and costs.

For comparison, findUsageLogsStats at line 885 explicitly adds EXCLUDE_WARMUP_CONDITION to statsConditions, and the old findUsageLogsWithDetails also applies it.

Suggested change
const conditions = [isNull(messageRequest.deletedAt), eq(messageRequest.key, keyString)];
const conditions = [isNull(messageRequest.deletedAt), eq(messageRequest.key, keyString), EXCLUDE_WARMUP_CONDITION];
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/usage-logs.ts
Line: 341:341

Comment:
**Missing `EXCLUDE_WARMUP_CONDITION` in slim query**

`findUsageLogsForKeySlim` does not include `EXCLUDE_WARMUP_CONDITION` in its `conditions` array, unlike the original `findUsageLogsWithDetails` which applies it. This means warmup requests will now be counted in usage logs results for users, potentially showing inflated request counts and costs.

For comparison, `findUsageLogsStats` at line 885 explicitly adds `EXCLUDE_WARMUP_CONDITION` to `statsConditions`, and the old `findUsageLogsWithDetails` also applies it.

```suggestion
  const conditions = [isNull(messageRequest.deletedAt), eq(messageRequest.key, keyString), EXCLUDE_WARMUP_CONDITION];
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 83 to 111
const query = sql`
SELECT
id,
endpoint_id as "endpointId",
source,
ok,
status_code as "statusCode",
latency_ms as "latencyMs",
error_type as "errorType",
error_message as "errorMessage",
created_at as "createdAt"
FROM (
SELECT
id,
endpoint_id,
source,
ok,
status_code,
latency_ms,
error_type,
error_message,
created_at,
ROW_NUMBER() OVER (PARTITION BY endpoint_id ORDER BY created_at DESC) AS row_num
FROM provider_endpoint_probe_logs
WHERE endpoint_id IN (${idList})
) ranked
WHERE ranked.row_num <= ${limitPerEndpoint}
ORDER BY ranked.endpoint_id ASC, ranked.created_at DESC
`;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL injection via unsanitized endpoint IDs

The endpointIds are interpolated into raw SQL via sql.join without parameterization. While these are numbers validated by the Zod schema at the action layer, this raw SQL construction pattern bypasses Drizzle's built-in parameterization. If this function were ever called from a path without prior validation, it would be vulnerable.

Consider using Drizzle's inArray operator (as done in findVendorTypeEndpointStatsBatch above) or ensure the values are parameterized:

Suggested change
const query = sql`
SELECT
id,
endpoint_id as "endpointId",
source,
ok,
status_code as "statusCode",
latency_ms as "latencyMs",
error_type as "errorType",
error_message as "errorMessage",
created_at as "createdAt"
FROM (
SELECT
id,
endpoint_id,
source,
ok,
status_code,
latency_ms,
error_type,
error_message,
created_at,
ROW_NUMBER() OVER (PARTITION BY endpoint_id ORDER BY created_at DESC) AS row_num
FROM provider_endpoint_probe_logs
WHERE endpoint_id IN (${idList})
) ranked
WHERE ranked.row_num <= ${limitPerEndpoint}
ORDER BY ranked.endpoint_id ASC, ranked.created_at DESC
`;
const query = sql`
SELECT
id,
endpoint_id as "endpointId",
source,
ok,
status_code as "statusCode",
latency_ms as "latencyMs",
error_type as "errorType",
error_message as "errorMessage",
created_at as "createdAt"
FROM (
SELECT
id,
endpoint_id,
source,
ok,
status_code,
latency_ms,
error_type,
error_message,
created_at,
ROW_NUMBER() OVER (PARTITION BY endpoint_id ORDER BY created_at DESC) AS row_num
FROM provider_endpoint_probe_logs
WHERE endpoint_id = ANY(${endpointIds})
) ranked
WHERE ranked.row_num <= ${limitPerEndpoint}
ORDER BY ranked.endpoint_id ASC, ranked.created_at DESC
`;

Using = ANY($1) with a proper array parameter is both safer and cleaner than building an inline IN (...) list.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider-endpoints-batch.ts
Line: 83:111

Comment:
**SQL injection via unsanitized endpoint IDs**

The `endpointIds` are interpolated into raw SQL via `sql.join` without parameterization. While these are numbers validated by the Zod schema at the action layer, this raw SQL construction pattern bypasses Drizzle's built-in parameterization. If this function were ever called from a path without prior validation, it would be vulnerable.

Consider using Drizzle's `inArray` operator (as done in `findVendorTypeEndpointStatsBatch` above) or ensure the values are parameterized:

```suggestion
  const query = sql`
    SELECT
      id,
      endpoint_id as "endpointId",
      source,
      ok,
      status_code as "statusCode",
      latency_ms as "latencyMs",
      error_type as "errorType",
      error_message as "errorMessage",
      created_at as "createdAt"
    FROM (
      SELECT
        id,
        endpoint_id,
        source,
        ok,
        status_code,
        latency_ms,
        error_type,
        error_message,
        created_at,
        ROW_NUMBER() OVER (PARTITION BY endpoint_id ORDER BY created_at DESC) AS row_num
      FROM provider_endpoint_probe_logs
      WHERE endpoint_id = ANY(${endpointIds})
    ) ranked
    WHERE ranked.row_num <= ${limitPerEndpoint}
    ORDER BY ranked.endpoint_id ASC, ranked.created_at DESC
  `;
```

Using `= ANY($1)` with a proper array parameter is both safer and cleaner than building an inline `IN (...)` list.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 460 to 471
@@ -482,9 +468,7 @@ export async function getMyUsageLogs(
minRetryCount: filters.minRetryCount,
page,
pageSize,
};

const result = await findUsageLogsWithDetails(usageFilters);
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing sessionId filter in slim query call

The old code passed all filters from UsageLogFilters to findUsageLogsWithDetails, but this new call to findUsageLogsForKeySlim does not pass filters.sessionId. The UsageLogSlimFilters interface does accept sessionId, so this looks like an accidental omission. Users filtering by session ID in the usage logs UI will get unfiltered results.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/actions/my-usage.ts
Line: 460:471

Comment:
**Missing `sessionId` filter in slim query call**

The old code passed all filters from `UsageLogFilters` to `findUsageLogsWithDetails`, but this new call to `findUsageLogsForKeySlim` does not pass `filters.sessionId`. The `UsageLogSlimFilters` interface does accept `sessionId`, so this looks like an accidental omission. Users filtering by session ID in the usage logs UI will get unfiltered results.

How can I resolve this? If you propose a fix, please make it concise.

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

34 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

ELSE (provider_chain->-1->>'id')::int
END AS final_provider_id,
cost_usd
FROM message_request, bounds
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implicit cross join with bounds CTE

The FROM message_request, bounds syntax is a cross join. Since bounds returns exactly one row, this is functionally correct and equivalent to using CROSS JOIN bounds. However, it relies on the CTE always returning exactly one row — if bounds ever returned zero rows (e.g., due to a CURRENT_TIMESTAMP edge case in some PostgreSQL configurations), the entire subquery would return no rows silently.

This is safe in practice since bounds uses deterministic DATE_TRUNC(CURRENT_TIMESTAMP) expressions, but an explicit CROSS JOIN would make the intent clearer:

            FROM message_request CROSS JOIN bounds

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider.ts
Line: 933:933

Comment:
**Implicit cross join with `bounds` CTE**

The `FROM message_request, bounds` syntax is a cross join. Since `bounds` returns exactly one row, this is functionally correct and equivalent to using `CROSS JOIN bounds`. However, it relies on the CTE always returning exactly one row — if `bounds` ever returned zero rows (e.g., due to a `CURRENT_TIMESTAMP` edge case in some PostgreSQL configurations), the entire subquery would return no rows silently.

This is safe in practice since `bounds` uses deterministic `DATE_TRUNC(CURRENT_TIMESTAMP)` expressions, but an explicit `CROSS JOIN` would make the intent clearer:

```
            FROM message_request CROSS JOIN bounds
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

unhealthy: 0,
unknown: 0,
});
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module-level mutable state shared across components

pendingVendorIdsByProviderType, deferredByProviderTypeVendorId, and vendorStatsFlushTimer are module-level mutable variables. While this batching pattern works well for coalescing concurrent requests, it means:

  1. The state persists across React component unmount/remount cycles — deferred promises from unmounted components will still be resolved.
  2. In a Fast Refresh (HMR) scenario during development, old deferred callbacks may reference stale closures.

This is a known trade-off of the micro-batching pattern and is unlikely to cause user-visible issues in production, but it's worth noting for maintainability. Consider wrapping this logic in a class (similar to ProbeLogsBatcher in endpoint-latency-sparkline.tsx) for better encapsulation and testability.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/[locale]/settings/providers/_components/provider-endpoint-hover.tsx
Line: 56:56

Comment:
**Module-level mutable state shared across components**

`pendingVendorIdsByProviderType`, `deferredByProviderTypeVendorId`, and `vendorStatsFlushTimer` are module-level mutable variables. While this batching pattern works well for coalescing concurrent requests, it means:

1. The state persists across React component unmount/remount cycles — deferred promises from unmounted components will still be resolved.
2. In a Fast Refresh (HMR) scenario during development, old deferred callbacks may reference stale closures.

This is a known trade-off of the micro-batching pattern and is unlikely to cause user-visible issues in production, but it's worth noting for maintainability. Consider wrapping this logic in a class (similar to `ProbeLogsBatcher` in `endpoint-latency-sparkline.tsx`) for better encapsulation and testability.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

@tesgth032
Copy link
Contributor Author

跟进最新 AI review(CodeRabbit / Greptile / Gemini)再收敛一轮(尽量不改变现有 UX/接口语义):

  • CodeRabbit 2806945634:移除 probeProviderEndpoint 成功后的重复 resetEndpointCircuitState(内部 probe 已处理),减少冗余 Redis 往返。
  • CodeRabbit 2806945635:配额/汇总扫描上界 scanEnd 保证 >= now,避免 costTotal 被旧 range.endTime 截断。
  • Greptile:editProviderEndpoint 仅在 URL 真实变更或从 disabled→enabled 时才重置熔断,减少无意义 Redis 写入。
  • Greptile:findUsageLogsForKeySlim 补齐 EXCLUDE_WARMUP_CONDITIONgetMyUsageLogs 增加 sessionId 过滤并透传;probe logs batch 对 endpointIdsNumber.isSafeInteger 防御过滤。
  • Gemini 安全提示:本 PR 涉及的 repository 模块改为 server-onlyprovider.ts / usage-logs.ts / statistics.ts),避免误当作 Server Action 暴露。

本地已验证:bun run buildbun run lint(仅既有 warnings)、bun run lint:fixbun run typecheckbun run test

@coderabbitai 上次 changes requested 的两点已覆盖修复,如方便请重新 review/更新状态。

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

34 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

return data;
})();

providerStatisticsInFlight = { timezone, promise: fetchPromise };
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In-flight set after IIFE starts

providerStatisticsInFlight is assigned after the async IIFE has already started executing on line 910. While this is safe in practice (the IIFE won't yield until the first await db.execute(query) on line 978, by which point line 997 has already executed), this ordering makes the code fragile — if any await were added earlier in the IIFE (e.g., for logging or validation), a concurrent caller arriving between lines 910 and 997 would bypass the dedup and start a second query.

Consider assigning the in-flight record before the IIFE begins executing, or wrapping the pattern with a helper that atomically sets the in-flight promise.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider.ts
Line: 997:997

Comment:
**In-flight set after IIFE starts**

`providerStatisticsInFlight` is assigned after the async IIFE has already started executing on line 910. While this is safe in practice (the IIFE won't yield until the first `await db.execute(query)` on line 978, by which point line 997 has already executed), this ordering makes the code fragile — if any `await` were added earlier in the IIFE (e.g., for logging or validation), a concurrent caller arriving between lines 910 and 997 would bypass the dedup and start a second query.

Consider assigning the in-flight record before the IIFE begins executing, or wrapping the pattern with a helper that atomically sets the in-flight promise.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 630 to 639
db
.select({
model: messageRequest.model,
requests: sql<number>`count(*)::int`,
cost: sql<string>`COALESCE(sum(${messageRequest.costUsd}), 0)`,
inputTokens: sql<number>`COALESCE(sum(${messageRequest.inputTokens}), 0)::double precision`,
outputTokens: sql<number>`COALESCE(sum(${messageRequest.outputTokens}), 0)::double precision`,
cacheCreationTokens: sql<number>`COALESCE(sum(${messageRequest.cacheCreationInputTokens}), 0)::double precision`,
cacheReadTokens: sql<number>`COALESCE(sum(${messageRequest.cacheReadInputTokens}), 0)::double precision`,
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

userBreakdown missing 5m/1h cache columns

The keyBreakdown query selects cacheCreation5mTokens and cacheCreation1hTokens (lines 614-615), but userBreakdown does not. While this doesn't cause a runtime error (the ModelBreakdownItem type doesn't include these fields, and only summaryAcc uses them from keyBreakdown), it creates an asymmetry that could mislead future maintainers into thinking the two queries have identical schemas.

Consider adding the same columns to userBreakdown for consistency, or leaving a comment explaining why they're intentionally omitted.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/actions/my-usage.ts
Line: 630:639

Comment:
**`userBreakdown` missing 5m/1h cache columns**

The `keyBreakdown` query selects `cacheCreation5mTokens` and `cacheCreation1hTokens` (lines 614-615), but `userBreakdown` does not. While this doesn't cause a runtime error (the `ModelBreakdownItem` type doesn't include these fields, and only `summaryAcc` uses them from `keyBreakdown`), it creates an asymmetry that could mislead future maintainers into thinking the two queries have identical schemas.

Consider adding the same columns to `userBreakdown` for consistency, or leaving a comment explaining why they're intentionally omitted.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

@coderabbitai
Copy link

coderabbitai bot commented Feb 14, 2026

@tesgth032 感谢您如此详尽的更新!我看到您已经系统性地处理了来自 CodeRabbit、Greptile 和 Gemini 的反馈意见,包括:

✅ 移除冗余的熔断重置调用
✅ 修正配额扫描边界确保 costTotal 完整性
✅ 优化熔断重置的触发条件
✅ 补齐过滤条件和输入验证
✅ 加强 repository 模块的安全隔离(server-only

而且您已经完成了完整的本地验证(build、lint、typecheck、test)。

我现在会重新审查本 PR 的所有变更。

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

36 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 150 to 160
const existingHealth = healthMap.get(endpointId);
if (!existingHealth || redisState.circuitState !== existingHealth.circuitState) {
healthMap.set(endpointId, {
failureCount: redisState.failureCount,
lastFailureTime: redisState.lastFailureTime,
circuitState: redisState.circuitState,
circuitOpenUntil: redisState.circuitOpenUntil,
halfOpenSuccessCount: redisState.halfOpenSuccessCount,
});
}
continue;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale failureCount/halfOpenSuccessCount when only circuitState differs

On line 151, the condition redisState.circuitState !== existingHealth.circuitState gates whether the in-memory health map is updated. If two instances share Redis state and one increments failureCount while the circuit remains closed, this instance will not pick up the updated failure count because the circuit state hasn't changed.

This means the local failure count can drift from Redis until a state transition occurs. For the circuit breaker's intended purpose (blocking requests to failing endpoints), this is low severity — the state transitions (closed→open) will still propagate. But it means admin dashboards reading from this function may show stale failure counts.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/endpoint-circuit-breaker.ts
Line: 150:160

Comment:
**Stale `failureCount`/`halfOpenSuccessCount` when only `circuitState` differs**

On line 151, the condition `redisState.circuitState !== existingHealth.circuitState` gates whether the in-memory health map is updated. If two instances share Redis state and one increments `failureCount` while the circuit remains `closed`, this instance will **not** pick up the updated failure count because the circuit state hasn't changed.

This means the local failure count can drift from Redis until a state transition occurs. For the circuit breaker's intended purpose (blocking requests to failing endpoints), this is low severity — the state transitions (closed→open) will still propagate. But it means admin dashboards reading from this function may show stale failure counts.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 1010 to 1012
} catch (error) {
deferred.reject(error);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error path does not clear providerStatisticsInFlight

When deferred.reject(error) is called on line 1011, the rejection propagates to the caller's await deferred.promise on line 1016, where the finally block clears providerStatisticsInFlight. However, if the caller's finally block throws (unlikely but possible), the in-flight record would leak.

More practically: because providerStatisticsInFlight is set to this deferred on line 922 and only cleared in the caller's finally, there's a window where the IIFE has rejected but providerStatisticsInFlight still references the rejected promise. During this window, a concurrent caller on line 906 would await the already-rejected promise and get the same error — which is actually the correct dedup behavior. So this is safe, just worth noting that the error is shared across all waiters.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider.ts
Line: 1010:1012

Comment:
**Error path does not clear `providerStatisticsInFlight`**

When `deferred.reject(error)` is called on line 1011, the rejection propagates to the caller's `await deferred.promise` on line 1016, where the `finally` block clears `providerStatisticsInFlight`. However, if the caller's `finally` block throws (unlikely but possible), the in-flight record would leak.

More practically: because `providerStatisticsInFlight` is set to this deferred on line 922 and only cleared in the caller's `finally`, there's a window where the IIFE has rejected but `providerStatisticsInFlight` still references the rejected promise. During this window, a concurrent caller on line 906 would `await` the already-rejected promise and get the same error — which is actually the correct dedup behavior. So this is safe, just worth noting that the error is shared across all waiters.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 46 to 64
const keyString = keyRecord?.[0]?.key ?? null;
if (!keyString) {
keyStringByIdCache.delete(keyId);
return null;
}

// Size 控制:先清理过期;仍超限则按 FIFO 淘汰一小部分,避免一次性清空导致缓存击穿
if (keyStringByIdCache.size >= KEY_STRING_BY_ID_CACHE_MAX_SIZE) {
for (const [id, value] of keyStringByIdCache) {
if (value.expiresAt <= now) {
keyStringByIdCache.delete(id);
}
}

if (keyStringByIdCache.size >= KEY_STRING_BY_ID_CACHE_MAX_SIZE) {
const evictCount = Math.max(1, Math.ceil(KEY_STRING_BY_ID_CACHE_MAX_SIZE * 0.1));
let remaining = evictCount;

for (const id of keyStringByIdCache.keys()) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIFO eviction doesn't evict the oldest entries

The eviction logic uses Map iteration order (insertion order), evicting the first 10% of entries. After expired entries are cleaned, the remaining entries may have been refreshed recently, but Map preserves insertion order from the original set call — not the most recent one. When a cache hit refreshes TTL via set, the entry keeps its original position in the iteration order.

This means a frequently accessed entry inserted early will be evicted before an infrequently accessed entry inserted later. Consider deleting and re-inserting on cache refresh to maintain LRU-like ordering.

This is low severity since the cache is small (1000 entries) and 5-minute TTL naturally evicts stale entries.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/statistics.ts
Line: 46:64

Comment:
**FIFO eviction doesn't evict the oldest entries**

The eviction logic uses `Map` iteration order (insertion order), evicting the first 10% of entries. After expired entries are cleaned, the remaining entries may have been refreshed recently, but `Map` preserves insertion order from the *original* `set` call — not the most recent one. When a cache hit refreshes TTL via `set`, the entry keeps its original position in the iteration order.

This means a frequently accessed entry inserted early will be evicted before an infrequently accessed entry inserted later. Consider deleting and re-inserting on cache refresh to maintain LRU-like ordering.

This is low severity since the cache is small (1000 entries) and 5-minute TTL naturally evicts stale entries.

How can I resolve this? If you propose a fix, please make it concise.

@tesgth032
Copy link
Contributor Author

本次在 PR #782 上继续补齐/修正了几处细节(均与 #779 性能/稳定性相关,尽量不改变现有 UX):

  • DB 迁移:将新增索引迁移调整为 0068_flawless_khan,避免与 upstream 0067_gorgeous_mulholland_black 编号冲突;并补齐 drizzle/meta/_journal.json + 0068_snapshot.json,确保 Docker 默认 AUTO_MIGRATE=true 时可无感自动建索引。
  • Providers UI:移除 endpointIds.toSorted(...)(部分浏览器不支持),改为 memo 化 slice().sort().join() 生成稳定 queryKey,避免兼容性问题与不必要的重复分配。
  • Usage/Quota 语义:sumUserQuotaCosts / sumKeyQuotaCostsByIdInfinity 明确按 all-time 语义处理(不再隐式回退 365 天截断),并新增仓储单测覆盖。
  • 维护性小优化:getEndpointFilterStats 去除重复过滤;findEnabledProviderEndpointsByVendorAndType 复用 providerEndpointSelectFields 常量。

本地验证:bun run test / bun run typecheck / bun run lint / bun run build 均已通过。

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

36 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

input.vendorId,
input.providerType
);
const filtered = endpoints.filter((e) => e.isEnabled && !e.deletedAt && !excludeSet.has(e.id));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant filter after query change

findEnabledProviderEndpointsByVendorAndType already filters for isEnabled = true and deletedAt IS NULL at the SQL level, so the e.isEnabled && !e.deletedAt checks here are redundant. Only !excludeSet.has(e.id) is still needed.

Suggested change
const filtered = endpoints.filter((e) => e.isEnabled && !e.deletedAt && !excludeSet.has(e.id));
const filtered = endpoints.filter((e) => !excludeSet.has(e.id));
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/provider-endpoints/endpoint-selector.ts
Line: 45:45

Comment:
**Redundant filter after query change**

`findEnabledProviderEndpointsByVendorAndType` already filters for `isEnabled = true` and `deletedAt IS NULL` at the SQL level, so the `e.isEnabled && !e.deletedAt` checks here are redundant. Only `!excludeSet.has(e.id)` is still needed.

```suggestion
  const filtered = endpoints.filter((e) => !excludeSet.has(e.id));
```

How can I resolve this? If you propose a fix, please make it concise.

@tesgth032
Copy link
Contributor Author

本次更新(同步 dev + 跟进少量 review 关注点):

  • merge:同步 origin/dev(v0.5.7 release / VERSION & CHANGELOG),保证 PR 基于最新 dev
  • fix:getAllEndpointHealthStatusAsyncforceRefresh 下也同步 failureCount/halfOpenSuccessCount 等计数字段,避免多实例下展示漂移;补充单测覆盖
  • fix(security):findProviderEndpointProbeLogsBatch 改为 inArray 参数化条件,避免手工拼接 IN (...) 的注入/维护风险(行为不变)

本地已验证:bun run lint/typecheck/test/build;Actions/Greptile 均已通过。

@tesgth032
Copy link
Contributor Author

本次在 PR #782 上额外合入对 issue #781(Endpoint Health 旧端点/旧类型残留、持续 probe)的修复,尽量保持现有 UX:

  • fix(Provider Availability Endpoint #781): syncProviderEndpointOnProviderEdit 在“旧 endpoint 已无任何活跃 provider 引用”且发生 unique 冲突/next 已存在时,改为 soft-delete 旧 endpoint(避免旧 URL 长期残留并被 scheduler/runtime 继续探测/使用)
  • fix(Provider Availability Endpoint #781): provider 删除/批量删除时,如对应 (vendorId, providerType, url) 已无活跃 provider 引用,则同步 soft-delete 对应 endpoint,防止形成孤儿端点
  • perf/fix(Provider Availability Endpoint #781): findEnabledProviderEndpointsForProbing 增加 EXISTS(active providers for vendor/type) 过滤,避免无任何活跃 provider 的 vendor/type 仍被后台 probe
  • UX(Provider Availability Endpoint #781): Endpoint Health(dashboard availability)vendor/type 下拉改为仅基于“活跃 providers”生成(动态 providerTypes),避免展示已删除 provider 的历史 vendor/type

本地验证:bun run lint(仅既有 warnings)、bun run typecheckbun run testbun run build 均通过;Actions/Greptile 也已全部通过。

@tesgth032 tesgth032 force-pushed the issue/779-provider-performance branch from c010d8d to 50d73a9 Compare February 14, 2026 18:09
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

58 files reviewed, 6 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 324 to 341
vendorId: providerEndpoints.vendorId,
providerType: providerEndpoints.providerType,
lastProbedAt: providerEndpoints.lastProbedAt,
lastProbeOk: providerEndpoints.lastProbeOk,
lastProbeErrorType: providerEndpoints.lastProbeErrorType,
})
.from(providerEndpoints)
.where(and(eq(providerEndpoints.isEnabled, true), isNull(providerEndpoints.deletedAt)))
.where(
and(
eq(providerEndpoints.isEnabled, true),
isNull(providerEndpoints.deletedAt),
sql`(${providerEndpoints.vendorId}, ${providerEndpoints.providerType}) IN (
SELECT p.provider_vendor_id, p.provider_type
FROM providers p
WHERE p.is_enabled = true
AND p.deleted_at IS NULL
)`
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correlated subquery executes for every endpoint row, potentially degrading performance under load

The (vendor_id, provider_type) IN (SELECT ...) pattern creates a correlated subquery that PostgreSQL may not optimize into a hash semi-join. For large endpoint tables scanned on each probe cycle, this could become a bottleneck.

Consider using a JOIN or IN subquery with materialization:

INNER JOIN (
  SELECT DISTINCT provider_vendor_id, provider_type
  FROM providers
  WHERE is_enabled = true AND deleted_at IS NULL
) enabled_providers
  ON provider_endpoints.vendor_id = enabled_providers.provider_vendor_id
  AND provider_endpoints.provider_type = enabled_providers.provider_type

This allows PostgreSQL to build the enabled providers set once and probe via hash lookup.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider-endpoints.ts
Line: 324:341

Comment:
Correlated subquery executes for every endpoint row, potentially degrading performance under load

The `(vendor_id, provider_type) IN (SELECT ...)` pattern creates a correlated subquery that PostgreSQL may not optimize into a hash semi-join. For large endpoint tables scanned on each probe cycle, this could become a bottleneck.

Consider using a JOIN or `IN` subquery with materialization:

```sql
INNER JOIN (
  SELECT DISTINCT provider_vendor_id, provider_type
  FROM providers
  WHERE is_enabled = true AND deleted_at IS NULL
) enabled_providers
  ON provider_endpoints.vendor_id = enabled_providers.provider_vendor_id
  AND provider_endpoints.provider_type = enabled_providers.provider_type
```

This allows PostgreSQL to build the enabled providers set once and probe via hash lookup.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +615 to 622
if (
previousUrl &&
previousProviderType &&
(previousUrl !== transformed.url ||
previousProviderType !== transformed.providerType ||
(previousVendorId != null && previousVendorId !== transformed.providerVendorId))
) {
const syncResult = await syncProviderEndpointOnProviderEdit(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_enabled check missing in endpoint sync condition

When only url or providerType changes (but previousIsEnabled stays false), the code still calls syncProviderEndpointOnProviderEdit. However, line 612 checks previousIsEnabled === false && transformed.isEnabled === true for the ensureProviderEndpointExistsForUrl path.

This creates an asymmetry: if a disabled provider's URL changes, it syncs the endpoint (line 610-635) but doesn't ensure existence (line 636-644). Verify this is intentional behavior, or consolidate the isEnabled transition handling.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider.ts
Line: 615:622

Comment:
`is_enabled` check missing in endpoint sync condition

When only `url` or `providerType` changes (but `previousIsEnabled` stays `false`), the code still calls `syncProviderEndpointOnProviderEdit`. However, line 612 checks `previousIsEnabled === false && transformed.isEnabled === true` for the `ensureProviderEndpointExistsForUrl` path.

This creates an asymmetry: if a disabled provider's URL changes, it syncs the endpoint (line 610-635) but doesn't ensure existence (line 636-644). Verify this is intentional behavior, or consolidate the `isEnabled` transition handling.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +161 to +178
// 从 Redis 同步到内存时,不能只在 circuitState 变化时才更新:
// failureCount / halfOpenSuccessCount 等字段在 forceRefresh 下也应保持一致。
const existingHealth = healthMap.get(endpointId);
if (existingHealth) {
existingHealth.failureCount = redisState.failureCount;
existingHealth.lastFailureTime = redisState.lastFailureTime;
existingHealth.circuitState = redisState.circuitState;
existingHealth.circuitOpenUntil = redisState.circuitOpenUntil;
existingHealth.halfOpenSuccessCount = redisState.halfOpenSuccessCount;
} else {
healthMap.set(endpointId, {
failureCount: redisState.failureCount,
lastFailureTime: redisState.lastFailureTime,
circuitState: redisState.circuitState,
circuitOpenUntil: redisState.circuitOpenUntil,
halfOpenSuccessCount: redisState.halfOpenSuccessCount,
});
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory sync only updates when circuitState differs, not when counters change

Line 167 gates the memory update on redisState.circuitState !== existingHealth.circuitState. If two instances share Redis and one increments failureCount while the circuit stays closed, this instance won't pick up the updated counter until a state transition occurs.

For getAllEndpointHealthStatusAsync with forceRefresh=true, this was addressed (lines 161-178 always sync all fields). However, for the single-endpoint getOrCreateHealth path (line 67), stale counters may persist. Consider always syncing all fields when Redis returns data, not just on state changes.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/endpoint-circuit-breaker.ts
Line: 161:178

Comment:
Memory sync only updates when `circuitState` differs, not when counters change

Line 167 gates the memory update on `redisState.circuitState !== existingHealth.circuitState`. If two instances share Redis and one increments `failureCount` while the circuit stays `closed`, this instance won't pick up the updated counter until a state transition occurs.

For `getAllEndpointHealthStatusAsync` with `forceRefresh=true`, this was addressed (lines 161-178 always sync all fields). However, for the single-endpoint `getOrCreateHealth` path (line 67), stale counters may persist. Consider always syncing all fields when Redis returns data, not just on state changes.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 184 to 187
if (!didAnyChunkSucceed) {
isBatchProbeLogsEndpointAvailable = false;
batchProbeLogsEndpointDisabledAt = Date.now();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Batch endpoint availability only set on first 404 when no chunks succeeded

If chunk 1 succeeds but chunk 2 returns 404, didAnyChunkSucceed is true, so isBatchProbeLogsEndpointAvailable won't be set to false. This is correct for rolling deployments (some nodes may lack the route temporarily).

However, if the very first batch request across all chunks hits 404 before any succeed, the endpoint is marked unavailable for 5 minutes. Consider a more gradual backoff or exponential retry instead of a fixed 5-minute penalty on first failure.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx
Line: 184:187

Comment:
Batch endpoint availability only set on first 404 when no chunks succeeded

If chunk 1 succeeds but chunk 2 returns 404, `didAnyChunkSucceed` is true, so `isBatchProbeLogsEndpointAvailable` won't be set to `false`. This is correct for rolling deployments (some nodes may lack the route temporarily).

However, if the very first batch request across all chunks hits 404 before any succeed, the endpoint is marked unavailable for 5 minutes. Consider a more gradual backoff or exponential retry instead of a fixed 5-minute penalty on first failure.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 895 to 918

const key = `${candidate.providerVendorId}::${candidate.providerType}::${candidate.url}`;
if (endpointKeys.has(key)) {
continue;
}

endpointKeys.set(key, {
vendorId: candidate.providerVendorId,
providerType: candidate.providerType,
url: candidate.url,
});
}

const endpoints = Array.from(endpointKeys.values());
if (endpoints.length === 0) {
return result.length;
}

const chunkSize = 200;
const activeEndpointKeys = new Set<string>();

for (let i = 0; i < endpoints.length; i += chunkSize) {
const chunk = endpoints.slice(i, i + chunkSize);
const tupleList = sql.join(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sequential queries inside transaction for batch delete

The for (const candidate of candidates) loop issues a SELECT + conditional UPDATE for each unique (vendor, type, url) combination. For large batch deletes (50+ providers), this holds the transaction open for many round-trips, increasing lock contention risk.

Consider batching the active reference checks into a single query using inArray with tuples or a CTE to reduce round-trips within the transaction.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider.ts
Line: 895:918

Comment:
Sequential queries inside transaction for batch delete

The `for (const candidate of candidates)` loop issues a SELECT + conditional UPDATE for each unique `(vendor, type, url)` combination. For large batch deletes (50+ providers), this holds the transaction open for many round-trips, increasing lock contention risk.

Consider batching the active reference checks into a single query using `inArray` with tuples or a CTE to reduce round-trips within the transaction.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +51 to +60
const keyString = keyRecord?.[0]?.key ?? null;
if (!keyString) {
keyStringByIdCache.delete(keyId);
return null;
}

// Size 控制:先清理过期;仍超限则淘汰最早的一小部分,避免一次性清空导致缓存击穿
if (keyStringByIdCache.size >= KEY_STRING_BY_ID_CACHE_MAX_SIZE) {
for (const [id, value] of keyStringByIdCache) {
if (value.expiresAt <= now) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIFO eviction doesn't maintain LRU ordering on cache refresh

When a cache hit occurs, set refreshes the TTL but preserves the entry's original insertion position in Map iteration order. During eviction (line 51-60), older frequently-accessed entries may be evicted before newer rarely-accessed ones.

For true LRU behavior, delete and re-insert on cache hits to update iteration order. Low severity since 5-minute TTL naturally evicts stale data.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/statistics.ts
Line: 51:60

Comment:
FIFO eviction doesn't maintain LRU ordering on cache refresh

When a cache hit occurs, `set` refreshes the TTL but preserves the entry's original insertion position in `Map` iteration order. During eviction (line 51-60), older frequently-accessed entries may be evicted before newer rarely-accessed ones.

For true LRU behavior, delete and re-insert on cache hits to update iteration order. Low severity since 5-minute TTL naturally evicts stale data.

How can I resolve this? If you propose a fix, please make it concise.

@tesgth032
Copy link
Contributor Author

本次更新(基于 dev 最新):

  • 已将 issue/779-provider-performance rebase 到最新 dev,同步合入 dev 的新增迁移与改动。
  • 解决 Drizzle 迁移序号冲突:索引迁移固定为 drizzle/0068_fuzzy_quasar.sql(插入在 dev 的 0067_*0069_* 之间)。
  • 修复 Drizzle meta snapshot 链:对齐 drizzle/meta/0068_snapshot.jsoniddrizzle/meta/0069_snapshot.jsonprevId,避免后续 bun run db:generate 因链路断裂导致异常。

本地验证:

  • bun run lint(仅既有 useExhaustiveDependencies warnings)
  • bun run typecheck
  • bun run test
  • bun run build

CI 状态:GitHub Actions 已全部通过;Greptile Review 仍在 pending(可能需要稍等刷新)。

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

58 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 324 to 342
vendorId: providerEndpoints.vendorId,
providerType: providerEndpoints.providerType,
lastProbedAt: providerEndpoints.lastProbedAt,
lastProbeOk: providerEndpoints.lastProbeOk,
lastProbeErrorType: providerEndpoints.lastProbeErrorType,
})
.from(providerEndpoints)
.where(and(eq(providerEndpoints.isEnabled, true), isNull(providerEndpoints.deletedAt)))
.where(
and(
eq(providerEndpoints.isEnabled, true),
isNull(providerEndpoints.deletedAt),
sql`(${providerEndpoints.vendorId}, ${providerEndpoints.providerType}) IN (
SELECT p.provider_vendor_id, p.provider_type
FROM providers p
WHERE p.is_enabled = true
AND p.deleted_at IS NULL
)`
)
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correlated subquery executes for every endpoint row in probe query

The (vendor_id, provider_type) IN (SELECT ...) pattern creates a correlated subquery checked per endpoint row. For large endpoint tables scanned on each probe cycle, this degrades performance significantly.

Replace with explicit JOIN for hash semi-join optimization:

Suggested change
vendorId: providerEndpoints.vendorId,
providerType: providerEndpoints.providerType,
lastProbedAt: providerEndpoints.lastProbedAt,
lastProbeOk: providerEndpoints.lastProbeOk,
lastProbeErrorType: providerEndpoints.lastProbeErrorType,
})
.from(providerEndpoints)
.where(and(eq(providerEndpoints.isEnabled, true), isNull(providerEndpoints.deletedAt)))
.where(
and(
eq(providerEndpoints.isEnabled, true),
isNull(providerEndpoints.deletedAt),
sql`(${providerEndpoints.vendorId}, ${providerEndpoints.providerType}) IN (
SELECT p.provider_vendor_id, p.provider_type
FROM providers p
WHERE p.is_enabled = true
AND p.deleted_at IS NULL
)`
)
)
.from(providerEndpoints)
.innerJoin(
db
.selectDistinct({
vendorId: providers.providerVendorId,
providerType: providers.providerType,
})
.from(providers)
.where(and(eq(providers.isEnabled, true), isNull(providers.deletedAt)))
.as('enabled_providers'),
and(
eq(providerEndpoints.vendorId, sql.raw('enabled_providers.vendor_id')),
eq(providerEndpoints.providerType, sql.raw('enabled_providers.provider_type'))
)
)
.where(
and(
eq(providerEndpoints.isEnabled, true),
isNull(providerEndpoints.deletedAt)
)
)
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider-endpoints.ts
Line: 324:342

Comment:
correlated subquery executes for every endpoint row in probe query

The `(vendor_id, provider_type) IN (SELECT ...)` pattern creates a correlated subquery checked per endpoint row. For large endpoint tables scanned on each probe cycle, this degrades performance significantly.

Replace with explicit JOIN for hash semi-join optimization:

```suggestion
    .from(providerEndpoints)
    .innerJoin(
      db
        .selectDistinct({
          vendorId: providers.providerVendorId,
          providerType: providers.providerType,
        })
        .from(providers)
        .where(and(eq(providers.isEnabled, true), isNull(providers.deletedAt)))
        .as('enabled_providers'),
      and(
        eq(providerEndpoints.vendorId, sql.raw('enabled_providers.vendor_id')),
        eq(providerEndpoints.providerType, sql.raw('enabled_providers.provider_type'))
      )
    )
    .where(
      and(
        eq(providerEndpoints.isEnabled, true),
        isNull(providerEndpoints.deletedAt)
      )
    )
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 895 to 920

const key = `${candidate.providerVendorId}::${candidate.providerType}::${candidate.url}`;
if (endpointKeys.has(key)) {
continue;
}

endpointKeys.set(key, {
vendorId: candidate.providerVendorId,
providerType: candidate.providerType,
url: candidate.url,
});
}

const endpoints = Array.from(endpointKeys.values());
if (endpoints.length === 0) {
return result.length;
}

const chunkSize = 200;
const activeEndpointKeys = new Set<string>();

for (let i = 0; i < endpoints.length; i += chunkSize) {
const chunk = endpoints.slice(i, i + chunkSize);
const tupleList = sql.join(
chunk.map(
(endpoint) => sql`(${endpoint.vendorId}, ${endpoint.providerType}, ${endpoint.url})`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sequential queries inside transaction for batch delete increase lock contention

The loop issues SELECT + conditional UPDATE for each unique (vendor, type, url). For large batch deletes (50+ providers), this holds the transaction open across many sequential queries, risking lock contention.

Batch the active reference checks into a single query using a CTE or VALUES list to reduce round-trips within the transaction.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider.ts
Line: 895:920

Comment:
sequential queries inside transaction for batch delete increase lock contention

The loop issues SELECT + conditional UPDATE for each unique `(vendor, type, url)`. For large batch deletes (50+ providers), this holds the transaction open across many sequential queries, risking lock contention.

Batch the active reference checks into a single query using a CTE or `VALUES` list to reduce round-trips within the transaction.

How can I resolve this? If you propose a fix, please make it concise.

tesgth032 and others added 2 commits February 15, 2026 03:27
- Provider 统计:消除隐式 cross join,收敛 in-flight 清理;deleteProvidersBatch 降低事务内往返\n- Providers hover:按 QueryClient 隔离微批量并支持 AbortSignal,减少串扰与潜在泄漏\n- Probe/熔断/缓存:probe 目标查询改为 join;Redis 同步时更新计数字段;统计缓存保持 FIFO 语义\n- My Usage:userBreakdown 补齐 5m/1h cache 聚合列(当前 UI 未展示)
@tesgth032
Copy link
Contributor Author

跟进:又补了一轮小修复/性能补强(不改变 UX),主要针对 Greptile/Review 提到的边角性能与一致性问题:

  • probe scheduler:findEnabledProviderEndpointsForProbing 改为 enabled_vendor_types CTE + INNER JOIN,避免 planner 在大表上走低效 semi-join
  • provider 删除:deleteProvidersBatch 事务内减少往返;端点 orphan 清理改为 UPDATE ... WHERE tuple IN (...) AND NOT EXISTS (...),减少锁持有与内存集合
  • provider 统计:消除 message_request CROSS JOIN bounds 写法;in-flight promise 无论成功/失败都能清理,避免异常路径卡住缓存
  • hover 统计:vendor/type stats micro-batch 按 QueryClient 隔离并支持 AbortSignal,减少卸载/取消后的串扰与潜在泄漏
  • 熔断状态:Redis 同步到内存时不再仅在 circuitState 变化时更新,避免计数/时间字段陈旧
  • 统计缓存:保持 FIFO 淘汰语义(命中时仅刷新 TTL,不再改变 Map 插入顺序)
  • my-usage:userBreakdown 补齐 5m/1h cache 聚合列(当前接口/UI 未展示)

本地已重新跑过:bun run lintbun run typecheckbun run testbun run build

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

58 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

}

const limitPerEndpoint = Math.max(1, input.limitPerEndpoint);
const endpointIdCondition = inArray(providerEndpointProbeLogs.endpointId, endpointIds);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using inArray with Drizzle's parameterization is more secure than building SQL strings, even with validated integers

action layer validates with Zod, but prefer parameterized approach for defense-in-depth:

Suggested change
const endpointIdCondition = inArray(providerEndpointProbeLogs.endpointId, endpointIds);
const endpointIdCondition = inArray(providerEndpointProbeLogs.endpointId, endpointIds);

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider-endpoints-batch.ts
Line: 80:80

Comment:
using `inArray` with Drizzle's parameterization is more secure than building SQL strings, even with validated integers

action layer validates with Zod, but prefer parameterized approach for defense-in-depth:

```suggestion
  const endpointIdCondition = inArray(providerEndpointProbeLogs.endpointId, endpointIds);
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +1110 to +1115
)
SELECT
p.id,
COALESCE(
SUM(CASE
WHEN (mr.created_at AT TIME ZONE ${timezone})::date = (CURRENT_TIMESTAMP AT TIME ZONE ${timezone})::date
AND (
-- 情况1:无重试(provider_chain 为 NULL 或空数组),使用 provider_id
(mr.provider_chain IS NULL OR jsonb_array_length(mr.provider_chain) = 0) AND mr.provider_id = p.id
OR
-- 情况2:有重试,使用 providerChain 最后一项的 id
(mr.provider_chain IS NOT NULL AND jsonb_array_length(mr.provider_chain) > 0
AND (mr.provider_chain->-1->>'id')::int = p.id)
)
THEN mr.cost_usd ELSE 0 END),
0
) AS today_cost,
COUNT(CASE
WHEN (mr.created_at AT TIME ZONE ${timezone})::date = (CURRENT_TIMESTAMP AT TIME ZONE ${timezone})::date
AND (
(mr.provider_chain IS NULL OR jsonb_array_length(mr.provider_chain) = 0) AND mr.provider_id = p.id
OR
(mr.provider_chain IS NOT NULL AND jsonb_array_length(mr.provider_chain) > 0
AND (mr.provider_chain->-1->>'id')::int = p.id)
)
THEN 1 END)::integer AS today_calls
COALESCE(ps.today_cost, 0) AS today_cost,
COALESCE(ps.today_calls, 0) AS today_calls,
lc.last_call_time,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tomorrow_start computed as plain timestamp (not timestamptz) due to inner AT TIME ZONE

comparison with timestamptz column relies on PostgreSQL's implicit cast using session timezone - works in UTC but could behave unexpectedly in non-UTC sessions

wrap with outer AT TIME ZONE for type consistency:

Suggested change
)
SELECT
p.id,
COALESCE(
SUM(CASE
WHEN (mr.created_at AT TIME ZONE ${timezone})::date = (CURRENT_TIMESTAMP AT TIME ZONE ${timezone})::date
AND (
-- 情况1:无重试(provider_chain NULL 或空数组),使用 provider_id
(mr.provider_chain IS NULL OR jsonb_array_length(mr.provider_chain) = 0) AND mr.provider_id = p.id
OR
-- 情况2:有重试,使用 providerChain 最后一项的 id
(mr.provider_chain IS NOT NULL AND jsonb_array_length(mr.provider_chain) > 0
AND (mr.provider_chain->-1->>'id')::int = p.id)
)
THEN mr.cost_usd ELSE 0 END),
0
) AS today_cost,
COUNT(CASE
WHEN (mr.created_at AT TIME ZONE ${timezone})::date = (CURRENT_TIMESTAMP AT TIME ZONE ${timezone})::date
AND (
(mr.provider_chain IS NULL OR jsonb_array_length(mr.provider_chain) = 0) AND mr.provider_id = p.id
OR
(mr.provider_chain IS NOT NULL AND jsonb_array_length(mr.provider_chain) > 0
AND (mr.provider_chain->-1->>'id')::int = p.id)
)
THEN 1 END)::integer AS today_calls
COALESCE(ps.today_cost, 0) AS today_cost,
COALESCE(ps.today_calls, 0) AS today_calls,
lc.last_call_time,
today_start AT TIME ZONE '${userTimezone}' AS today_start,
(today_start + INTERVAL '1 day') AT TIME ZONE '${userTimezone}' AS tomorrow_start,
(today_start - INTERVAL '6 days') AT TIME ZONE '${userTimezone}' AS last7_start
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider.ts
Line: 1110:1115

Comment:
`tomorrow_start` computed as plain `timestamp` (not `timestamptz`) due to inner `AT TIME ZONE`

comparison with `timestamptz` column relies on PostgreSQL's implicit cast using session timezone - works in UTC but could behave unexpectedly in non-UTC sessions

wrap with outer `AT TIME ZONE` for type consistency:

```suggestion
        today_start AT TIME ZONE '${userTimezone}' AS today_start,
        (today_start + INTERVAL '1 day') AT TIME ZONE '${userTimezone}' AS tomorrow_start,
        (today_start - INTERVAL '6 days') AT TIME ZONE '${userTimezone}' AS last7_start
```

How can I resolve this? If you propose a fix, please make it concise.

- batchUpdateProviders 会走 updateProvidersBatch;当供应商从 disabled 批量启用时,best-effort 插入缺失的 provider_endpoints 记录\n- 避免历史/竞态导致启用后严格端点策略下无可用 endpoint 而被阻断
@tesgth032
Copy link
Contributor Author

补一处稳定性缺口(仍与 #779/#781 相关,且不改变现有 UX):

  • batchUpdateProvidersupdateProvidersBatch 批量启用供应商时,best-effort 确保 endpoint pool 中存在对应 URL(插入缺失的 provider_endpoints 记录)
  • 目的:避免历史数据/竞态导致“批量启用后,在严格端点策略下 endpoint pool 为空而被阻断”的边界问题

本地已复跑:bun run lintbun run typecheckbun run testbun run build

@tesgth032
Copy link
Contributor Author

本次在 PR #782 上继续做了一轮查漏补缺(仍严格聚焦 #779/#781),主要是把少量“成功路径的刷新放大”和“多实例熔断恢复”再收敛一层,并补齐 my-usage 的同类 SQL 退化点:

  • Providers Vendor 视图:把 vendors.find(...) 改为 vendorById 映射,避免 vendors 较多时的 O(n^2) 查找开销(不改 UX)。
  • Endpoint 熔断重置(UI):resetEndpointCircuit 成功后不再 invalidateQueries(["endpoint-circuit-info"]) 触发全局重拉;改为仅对“确实包含该 endpointId”的 circuit-info 缓存做定点乐观更新,避免一次重置放大成多次 batch circuit 请求(失败仍保留 invalidation 兜底)。
  • Probe 成功后的熔断恢复(多实例一致性):探测成功时不再依赖本实例内存态判断是否 reset(内存缺失会误判 closed);在启用 endpoint circuit breaker 时,探测成功会 best-effort 执行 resetEndpointCircuit(endpointId),从而可清理 Redis 中来自其它实例的 open/half-open 状态(提升恢复速度与一致性)。
  • My Usage logs(SQL):findUsageLogsForKeySlim 去掉 count(*) OVER()(避免窗口函数导致分页查询被迫全量扫描/排序),改为 LIMIT pageSize+1 判定是否还有下一页;仅在需要精确 total 时再单独 COUNT(*)。同时 getDistinctModelsForKey/getDistinctEndpointsForKey 补齐 warmup 排除条件,与 logs 语义一致并减少无意义 distinct 扫描。
  • recluster vendors:把 vendor 预加载从 Promise.all(findProviderVendorById) 替换为 findProviderVendorsByIds,避免潜在的 N 次 DB 往返(仅影响 admin 的 recluster 操作)。

本地验证:bun run lint:fixbun run lintbun run typecheckbun run testbun run build 均通过。

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

61 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

// 如果 provider_chain 为空(无重试),则使用 provider_id 字段
const query = sql`
WITH provider_stats AS (
const promise = Promise.resolve().then(async () => {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache not invalidated on error path

When the IIFE created by Promise.resolve().then(async () => { ... }) throws, providerStatisticsCache is never updated — which is correct (stale-on-error). However, the providerStatisticsInFlight dedup relies on try/finally at line 1234 to clean up, but exceptions thrown before providerStatisticsInFlight is assigned (line 1231) would propagate to the outer catch at line 1240 without any dedup state to clear.

More importantly, Promise.resolve().then(...) creates a microtask where uncaught rejections inside the IIFE will only propagate when the promise is awaited at line 1235. If resolveSystemTimezone() (called before the IIFE, at line 1126) were to be moved inside the IIFE in a future refactor, and it threw synchronously before the IIFE's first await, the error path would still work correctly due to the try/finally wrapping.

This is safe as-written, but the Promise.resolve().then(...) pattern is somewhat unusual for dedup. A simpler approach would be to make the function async and assign the dedup promise directly from the call, avoiding the indirection.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider.ts
Line: 1140:1140

Comment:
**Cache not invalidated on error path**

When the IIFE created by `Promise.resolve().then(async () => { ... })` throws, `providerStatisticsCache` is never updated — which is correct (stale-on-error). However, the `providerStatisticsInFlight` dedup relies on `try/finally` at line 1234 to clean up, but exceptions thrown before `providerStatisticsInFlight` is assigned (line 1231) would propagate to the outer `catch` at line 1240 without any dedup state to clear.

More importantly, `Promise.resolve().then(...)` creates a microtask where uncaught rejections inside the IIFE will only propagate when the promise is `await`ed at line 1235. If `resolveSystemTimezone()` (called before the IIFE, at line 1126) were to be moved inside the IIFE in a future refactor, and it threw synchronously before the IIFE's first `await`, the error path would still work correctly due to the `try/finally` wrapping.

This is safe as-written, but the `Promise.resolve().then(...)` pattern is somewhat unusual for dedup. A simpler approach would be to make the function `async` and assign the dedup promise directly from the call, avoiding the indirection.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

if (showSkeleton) {
return (
<div ref={ref} className="flex items-center gap-2">
<div className="h-6 w-32 rounded bg-muted/20" />
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isLoading preferred over isFetching for skeleton display

showSkeleton uses isLoading (which is the correct choice for first-load skeletons — this was fixed from an earlier version that used isFetching). However, the condition !isInView || isLoading means that even after data has loaded, if isInView somehow toggles back to false (which shouldn't happen with useInViewOnce), it would show skeleton again.

Since useInViewOnce guarantees isInView only transitions from false → true (never back), this is safe. Just noting for clarity.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx
Line: 493:493

Comment:
**`isLoading` preferred over `isFetching` for skeleton display**

`showSkeleton` uses `isLoading` (which is the correct choice for first-load skeletons — this was fixed from an earlier version that used `isFetching`). However, the condition `!isInView || isLoading` means that even after data has loaded, if `isInView` somehow toggles back to `false` (which shouldn't happen with `useInViewOnce`), it would show skeleton again.

Since `useInViewOnce` guarantees `isInView` only transitions from `false → true` (never back), this is safe. Just noting for clarity.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

- AvailabilityDashboard: 抑制重叠/乱序刷新,前后台切换节流强刷\n- Probe scheduler/cleanup: idle DB poll + 锁续租,降低无意义扫描与并发清理\n- Endpoint circuit: Redis 同步节流(1s)\n- My Usage: key/user breakdown 合并为单次聚合\n- DB: 新增 message_request key+model/endpoint 部分索引迁移;修复 journal 单调性校验与迁移表 created_at 自愈
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

68 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 17 to 19
queries: {
staleTime: 30_000,
refetchOnWindowFocus: false,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Global refetchOnWindowFocus: false affects all queries

Setting refetchOnWindowFocus: false and staleTime: 30_000 as global defaults applies to every React Query in the application, not just the providers pages targeted by this PR. Pages like the dashboard overview, usage statistics, and leaderboard previously relied on the default refetchOnWindowFocus: true to update data when users return to the tab.

While many provider-related queries already set refetchOnWindowFocus: false individually, applying it globally could cause other pages to feel stale — for example, a user switching between a terminal and the usage dashboard would no longer see fresh data without manual refresh.

Consider whether these defaults should remain global, or if they should only be set on the specific queries affected by the request storm issue.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/providers.tsx
Line: 17:19

Comment:
**Global `refetchOnWindowFocus: false` affects all queries**

Setting `refetchOnWindowFocus: false` and `staleTime: 30_000` as global defaults applies to *every* React Query in the application, not just the providers pages targeted by this PR. Pages like the dashboard overview, usage statistics, and leaderboard previously relied on the default `refetchOnWindowFocus: true` to update data when users return to the tab.

While many provider-related queries already set `refetchOnWindowFocus: false` individually, applying it globally could cause other pages to feel stale — for example, a user switching between a terminal and the usage dashboard would no longer see fresh data without manual refresh.

Consider whether these defaults should remain global, or if they should only be set on the specific queries affected by the request storm issue.

How can I resolve this? If you propose a fix, please make it concise.

// 如果 provider_chain 为空(无重试),则使用 provider_id 字段
const query = sql`
WITH provider_stats AS (
const promise = Promise.resolve().then(async () => {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In-flight dedup assigned after IIFE starts executing

providerStatisticsInFlight is set on line 1231, but the IIFE created by Promise.resolve().then(async () => { ... }) is already executing as a microtask by the time line 1231 runs. In practice this is safe because Promise.resolve().then(...) schedules the callback as a microtask that won't run until the current synchronous execution completes (including line 1231). However, this ordering is fragile — if this function were ever refactored to use await between the IIFE creation and the providerStatisticsInFlight assignment, concurrent callers could bypass the dedup check.

A more robust pattern would assign the in-flight record synchronously before starting the async work:

Suggested change
const promise = Promise.resolve().then(async () => {
const promise = (async () => {

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider.ts
Line: 1140:1140

Comment:
**In-flight dedup assigned after IIFE starts executing**

`providerStatisticsInFlight` is set on line 1231, but the IIFE created by `Promise.resolve().then(async () => { ... })` is already executing as a microtask by the time line 1231 runs. In practice this is safe because `Promise.resolve().then(...)` schedules the callback as a microtask that won't run until the current synchronous execution completes (including line 1231). However, this ordering is fragile — if this function were ever refactored to use `await` between the IIFE creation and the `providerStatisticsInFlight` assignment, concurrent callers could bypass the dedup check.

A more robust pattern would assign the in-flight record synchronously before starting the async work:

```suggestion
    const promise = (async () => {
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.


while (true) {
const whereClauses = [isNull(providers.deletedAt), gt(providers.id, lastProviderId)];
const whereClauses = [
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backfill only processes enabled providers, skipping disabled ones

The new eq(providers.isEnabled, true) filter means the backfill will no longer create endpoint pool entries for disabled providers. This is intentional per the comment, but could be a subtle behavioral change: if an admin later enables a previously-disabled provider, the endpoint pool entry won't exist until the next syncProviderEndpointOnProviderEdit or updateProvidersBatch call.

The updateProvidersBatch path (line 851-927) does handle this case (it inserts endpoints when enabling), and updateProvider also handles the previousIsEnabled === false && transformed.isEnabled === true transition. Just confirming this coverage is intentional — the backfill is no longer the safety net for disabled→enabled transitions.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider-endpoints.ts
Line: 1640:1640

Comment:
**Backfill only processes enabled providers, skipping disabled ones**

The new `eq(providers.isEnabled, true)` filter means the backfill will no longer create endpoint pool entries for disabled providers. This is intentional per the comment, but could be a subtle behavioral change: if an admin later enables a previously-disabled provider, the endpoint pool entry won't exist until the next `syncProviderEndpointOnProviderEdit` or `updateProvidersBatch` call.

The `updateProvidersBatch` path (line 851-927) does handle this case (it inserts endpoints when enabling), and `updateProvider` also handles the `previousIsEnabled === false && transformed.isEnabled === true` transition. Just confirming this coverage is intentional — the backfill is no longer the safety net for disabled→enabled transitions.

How can I resolve this? If you propose a fix, please make it concise.

@tesgth032
Copy link
Contributor Author

本次补充提交(ec3cf1fe):

  • 移除 src/app/providers.tsx 中 QueryClient 的全局 defaultOptionsrefetchOnWindowFocus: false / staleTime),恢复 react-query 默认行为,避免影响非 Providers 页面依赖“窗口聚焦自动刷新”的数据新鲜度。
  • Providers / Usage Logs 等高开销 query 仍保持在 query 级显式配置 refetchOnWindowFocus: false 与合适的 staleTime,确保 0.5.x 以来前端性能持续下降,0.5.6 供应商管理 bug 增多 #779 的性能收益不回退。

本地已复跑:bun run lintbun run typecheckbun run testbun run build;CI checks 全绿。

@tesgth032
Copy link
Contributor Author

本次补充提交(4093ddd1,仍聚焦 #781 的 Endpoint Health 体感):

  • EndpointTab 在 focus/visibility 触发的 vendors 刷新过程中,若 vendor/type 发生自动切换,立即 setSelectedEndpoint(null),避免 UI 短窗口仍可对“旧 endpoint”点击 Probe(以及避免旧选择短暂残留带来的困惑)。

本地已复跑:bun run lintbun run typecheckbun run testbun run build

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

69 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +275 to +279
const workers = Array.from({ length: Math.min(concurrency, endpointIds.length) }, async () => {
for (;;) {
const current = endpointIds[idx];
idx += 1;
if (current == null) return;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

race condition in concurrent index increment

while the comment claims single-threaded safety, idx++ happens after async boundary. if worker A reads idx=0 but yields before incrementing, worker B can also read idx=0, causing both to process endpointIds[0] twice

use atomic pattern:

const current = idx++;
if (current >= endpointIds.length) return;
const endpointId = endpointIds[current];
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/[locale]/settings/providers/_components/endpoint-latency-sparkline.tsx
Line: 275:279

Comment:
race condition in concurrent index increment

while the comment claims single-threaded safety, `idx++` happens after async boundary. if worker A reads `idx=0` but yields before incrementing, worker B can also read `idx=0`, causing both to process `endpointIds[0]` twice

use atomic pattern:
```
const current = idx++;
if (current >= endpointIds.length) return;
const endpointId = endpointIds[current];
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +83 to +90
const endpointValues = sql.join(
endpointIds.map((id) => sql`(${id})`),
sql`, `
);

const query = sql`
WITH endpoint_ids(endpoint_id) AS (
VALUES ${endpointValues}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

endpoint IDs interpolated into raw SQL via sql.join

while Zod validates these as integers at action layer, prefer Drizzle's parameterized inArray for defense-in-depth:

const query = sql`
  SELECT ... FROM ${providerEndpointProbeLogs}
  WHERE endpoint_id = ANY(${sql.array(endpointIds)})
  ORDER BY ...

or use inArray(providerEndpointProbeLogs.endpointId, endpointIds) in WHERE clause

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider-endpoints-batch.ts
Line: 83:90

Comment:
endpoint IDs interpolated into raw SQL via `sql.join`

while Zod validates these as integers at action layer, prefer Drizzle's parameterized `inArray` for defense-in-depth:

```typescript
const query = sql`
  SELECT ... FROM ${providerEndpointProbeLogs}
  WHERE endpoint_id = ANY(${sql.array(endpointIds)})
  ORDER BY ...
```

or use `inArray(providerEndpointProbeLogs.endpointId, endpointIds)` in WHERE clause

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 945 to 948
const idList = sql.join(
uniqueIds.map((id) => sql`${id}`),
sql`, `
);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDs interpolated into raw SQL via sql.join

prefer parameterized inArray:

.where(and(
  inArray(providers.id, uniqueIds),
  isNull(providers.deletedAt)
))
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/provider.ts
Line: 945:948

Comment:
IDs interpolated into raw SQL via `sql.join`

prefer parameterized `inArray`:
```typescript
.where(and(
  inArray(providers.id, uniqueIds),
  isNull(providers.deletedAt)
))
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +64 to +72
if (keyStringByIdCache.size >= KEY_STRING_BY_ID_CACHE_MAX_SIZE) {
const evictCount = Math.max(1, Math.ceil(KEY_STRING_BY_ID_CACHE_MAX_SIZE * 0.1));
let remaining = evictCount;

for (const id of keyStringByIdCache.keys()) {
keyStringByIdCache.delete(id);
remaining -= 1;
if (remaining <= 0) break;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIFO eviction doesn't maintain true LRU ordering on cache refresh

Map iteration order reflects original insertion time. when cache hit refreshes TTL via set (line 76), entry keeps its original position in iteration order. frequently-accessed entries inserted early will be evicted before rarely-accessed entries inserted later

for true LRU, delete then re-insert on cache hit:

keyStringByIdCache.delete(keyId);
keyStringByIdCache.set(keyId, cached);
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/statistics.ts
Line: 64:72

Comment:
FIFO eviction doesn't maintain true LRU ordering on cache refresh

`Map` iteration order reflects original insertion time. when cache hit refreshes TTL via `set` (line 76), entry keeps its original position in iteration order. frequently-accessed entries inserted early will be evicted before rarely-accessed entries inserted later

for true LRU, delete then re-insert on cache hit:
```typescript
keyStringByIdCache.delete(keyId);
keyStringByIdCache.set(keyId, cached);
```

How can I resolve this? If you propose a fix, please make it concise.

- 按 (root+options) 复用 observer pool,减少长列表/大表格下的 observer 实例数\n- 补齐单测覆盖(test env 直通 + 共享/释放语义)
@tesgth032
Copy link
Contributor Author

本次补充提交(fdc1470d,仍聚焦 #779 Providers 页长列表资源占用):

  • useInViewOnce 改为按 (root + options) 复用共享 IntersectionObserver,避免“每行/每卡片一个 observer”在大列表下的 CPU/内存/回调开销累积;首次进入视窗触发语义保持不变。
  • 补齐单测:覆盖 test 环境直通可见、共享 observer 复用与最后一个 target 解绑后的释放语义。

本地已复跑:bun run lintbun run lint:fixbun run typecheckbun run testbun run build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:provider area:UI enhancement New feature or request size/L Large PR (< 1000 lines)

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

1 participant