Skip to content

fix(session): 修复会话切换混乱 + conversation not found(含 feat/model 分支累积工作)#29

Merged
TYRMars merged 3 commits into
mainfrom
feat/model
May 8, 2026
Merged

fix(session): 修复会话切换混乱 + conversation not found(含 feat/model 分支累积工作)#29
TYRMars merged 3 commits into
mainfrom
feat/model

Conversation

@TYRMars
Copy link
Copy Markdown
Owner

@TYRMars TYRMars commented May 8, 2026

Summary

  • 本次会话核心修复:诊断并修复用户上报的 "会话切换会出现会话混乱,还有 conversation `` not found" 问题。三处串联的根因(后端 New 延迟落盘、前端 resumeConversation 时序倒置、handleScopedFrame 在每帧翻转 activeId),五处小改动,附带 4 个回归用例。
  • 同分支 snapshot:把 `feat/model` 分支上之前累积的其他在产工作一并入库 — model registry / routing policy / fallback / market / customize / observability OTel infra & UI / SubAgent runs / harness-cli telemetry 等。

会话切换修复细节

Step 文件 改动
1 crates/harness-server/src/routes.rs WS `New` 立即 `save_envelope`,消除新会话未到 `Done` 就被遗弃 → resume 404 的链路
2 services/conversations.ts `setActiveId` + `syncSessionRoute` 前移到 `sendFrame` 前;导出 `clearSessionRoute`
3 services/frames.ts `handleScopedFrame` 不再翻 activeId、不再广播给 `legacyDispatchFrame`
4 services/frames/lifecycleFrames.ts `error` 处理器识别 "conversation \`...\` not found" → 清行 + 重置 activeId
5 routes.rs Resume 加载错误归一为 not-found 信号,让前端清理路径统一

回归测试:frames.test.ts 新增 4 个用例(3 个 not-found 清理 + 1 个后台帧不翻 activeId)。

同分支 snapshot 包含的其他工作

  • Model & routing:harness-llm 增加 profile.rs / capability_validating.rs / fallback.rs,harness-server 加 route_policy.rs 和 /v1/routing REST,前端加路由设置面板和 FallbackBanner
  • Market & customize:harness-server::market_routes、前端 Customize/ 页 + MarketPanels、services/market.ts / tools.ts
  • SubAgent runs:harness-server::subagent_runs*、harness-subagents::batch、前端 SubAgentRunsRail
  • Observability:infra/otel/(collector + compose)、docs/observability/local-stack.md、harness-server::observability_routes 大改、harness-cli + harness/server 增加 telemetry.rs
  • WorkOverview UI 重做:HarnessObservabilityPanel / HealthCenter / KpiStrip 调整,删除 HarnessEvolutionPanel
  • harness-tools/doc.rs 与默认 doc skill 扩展、harness-mcp::manager 完善

Test plan

  • `cargo test -p harness-server` — 268/268 通过(含 spec_to_done_e2e)
  • `vitest run` — 310/310 通过(含 4 个新增 not-found / scoped-frame 回归用例)
  • `cargo clippy -p harness-server --all-targets -- -D warnings` — 通过
  • 触动前端文件 `eslint --quiet` — 0 errors
  • 真实 WS 实测 Scenario A:发 `new` → `started` → 立即 close(不发 user、不到 done)→ REST GET 200(pre-fix 是 404),WS resume 返回 `resumed` 帧
  • 真实 WS 实测 not-found 消息格式:服务端对 ghost id 回 `{type:"error", message:"conversation \`\` not found"}`,匹配前端 NOT_FOUND_RE
  • 流式中切换会话的 surface 隔离 — Vite dev 全局 socket 重连失败 + HMR 多实例 store 阻塞了实时观察;不变量由 frames.test.ts::"background-conversation frames do NOT flip activeId" 单测锁定

非阻塞观察

Vite dev SPA 的全局 /v1/chat/ws 持续 "重新连接中..."(fresh `new WebSocket` 是好的,所以是 SPA 客户端的连接管理逻辑问题)。这跟本 PR 的改动无关,建议另开 issue 排查。

🤖 Generated with Claude Code

TYRMars and others added 3 commits May 9, 2026 01:09
…ch race + branch snapshot

## Session-switch fix (this session's focused work)

修复"会话切换混乱 + conversation `<uuid>` not found":

- **后端 `New` 立即落盘** (crates/harness-server/src/routes.rs):消除新会话未到 `Done` 就被遗弃时的 404 链路。
- **后端 Resume 加载错误归一为 not-found 信号** (routes.rs):损坏 JSON / IO hiccup 走与 `Ok(None)` 一致的清理路径。
- **前端 `setActiveId` 前移** (services/conversations.ts):消除 `sendFrame` 与 `setActiveId` 之间的微秒级竞态窗口;导出 `clearSessionRoute`。
- **前端 `handleScopedFrame` 不再翻 `activeId`** (services/frames.ts):流式后台帧每秒十几条不再让全局订阅者看到 activeId 抖动;后台帧不再广播给 legacyDispatchFrame。
- **前端识别 not-found 错误并清理陈旧行** (services/frames/lifecycleFrames.ts):从 `convoRows` 移除幽灵 id、清掉 surface 缓存,必要时重置 activeId。
- **回归测试** (services/frames.test.ts):4 个新增用例锁住 not-found cleanup + 后台帧不翻 activeId 的不变量。

设计文档:详见 plan 文件(不入库)。

## 同分支 snapshot(先前其他工作的批量提交)

把 feat/model 分支上累积的其他在产工作一并入库:
- model registry / routing policy / capability validating provider profile (harness-llm + harness-server)
- fallback events + UI banner + slice
- market + tools + customize page + routing settings section
- subagent runs rail + REST routes + batch
- harness-tools/doc.rs 与 harness-skill 默认 doc skill 扩展
- observability (OTel) infra + docs + observability_routes 大改
- HarnessObservabilityPanel / HealthCenter / KpiStrip / WorkOverview UI 重做
- jarvis-cli 增加 telemetry + web 子命令
- 服务端 chat_runs / mcp_routes / state / market_routes / route_policy / subagent_runs* 等扩展

## 验证

- cargo test -p harness-server: 268/268 ✅
- vitest: 310/310 ✅(含 4 个新 not-found / scoped-frame 回归用例)
- cargo clippy -p harness-server -- -D warnings: 通过
- 触动文件 ESLint: 0 errors
- 实测:WS `new` 立即关闭后 REST GET 200(pre-fix 是 404);resume 不存在 id 后端 `error: conversation \`<id>\` not found`

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Slow CI runner (Linux GitHub Actions) was racing past the wait loop
before all four drive tasks had written their initial Pending row,
producing a downstream `completed=2 of 4` failure even though the
core invariant (peak in-flight ≤ 2) was intact.

Add `runs.len() >= 4` to the loop's exit condition so the wait
holds until every spawned drive task has at least persisted its
Pending record, and surface `runs_total` / `pending_or_running` in
the deadline assertion message for future debugging.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI's ESLint enforces `@typescript-eslint/no-unnecessary-type-assertion`
as an error. Two new files from this branch's bundled work tripped it:

- RoutingSection.tsx:222 — `policy[slot] as ModelTarget | undefined`
  is redundant; `RouteSlot` keys narrow `RoutePolicy` to that shape.
- ToolsSection.tsx:181/186/191 — three `Select` onChange handlers
  cast `v as ToolSourceKind | "" / ToolPack | "" / ToolRisk | ""`,
  but `Select<T extends string>` already infers `T` from the typed
  `value` prop, so `v` is already the correct narrowed type.

Local repro: `npx eslint --quiet src/components/Settings/sections/{Routing,Tools}Section.tsx`.
After fix: 0 errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@TYRMars TYRMars merged commit d35190b into main May 8, 2026
1 check passed
TYRMars pushed a commit that referenced this pull request May 10, 2026
Bumps `[workspace.package].version` from 0.1.0 to 0.2.0 (cascades to
every member crate via `version.workspace = true`) and reframes the
existing `## Unreleased` block as the `0.2.0` release.

The 0.2.0 entry consolidates four PRs that landed since this CHANGELOG
was first introduced:

- PR #28 (OTel + model registry): observability stack, model registry +
  routing policy + capability-validating provider profile, fallback UI,
  Customize / Market panels, jarvis-cli telemetry/web subcommands.
- PR #29 (model + session race fix): session-switch race repair around
  `conversation <uuid> not found`, scoped-frame invariants, 4 new
  vitest cases.
- PR #30 (channels + sidebar + auto-mode refactor): WeCom WebSocket
  gateway adapter, Channels REST + Settings section, Codex-style chat
  sidebar with project groups, ~700 LOC trimmed from auto_mode.rs,
  RUN_TIMEOUT default 5m → 10m.
- PR #31 (Makefile + docs): align local cargo commands with CI's
  `--exclude jarvis-desktop` invariant.

Plus the original CHANGELOG-introducing PR's web composer / workspace
probe / WS metadata work.

Verified: `cargo check --workspace --exclude jarvis-desktop` passes,
Cargo.lock picks up the version bump (25 entries flipped).

`## Unreleased` is left as a fresh "_No changes yet._" placeholder so
future PRs have an obvious place to land.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant