feat(desktop): 触发按钮显示模型服务商#5
Conversation
…#3953) (esengine#3957) A proxy that idle-closes the SSE connection with a clean FIN ends the scan loop with no error, so the turn was committed as complete -- with whatever fraction of the tool-call arguments had streamed. DeepSeek then rejects every subsequent request that replays the truncated JSON with HTTP 400, bricking the session. Two layers: - readStream now requires [DONE] or a finish_reason; a clean EOF without either is a connection cut, which the existing replay / StreamInterrupted recovery handles (partial calls are never emitted). - SanitizeToolPairing closes truncated argument JSON before requests are built, so sessions already poisoned by this bug resume working. Co-authored-by: reasonix <reasonix@deepseek.com>
… dual-model sessions (esengine#3958) In planner+executor mode the executor session opens with the handoff boilerplate, so every session preview and auto-generated title collapsed to 'Reasonix executor'. HandoffTask extracts the embedded original task; the session picker, serve preview, and desktop topic titles all use it. Closes esengine#3860 Co-authored-by: reasonix <reasonix@deepseek.com>
…ig.json (esengine#3960) v0.x stored MCP servers in two shapes: the canonical mcpServers map and the older `mcp` list of --mcp-format strings (name=cmd args, name=URL for SSE, name=streamable+URL), with mcpEnv/mcpDisabled keyed by name. Both the one-time migration and the live lowest-priority legacy source only read mcpServers, so servers configured the old way -- including the default memory server -- silently vanished on upgrade. Parse the string list in both paths; mcpServers wins a name collision, matching the v0.x merge order. Anonymous specs get a synthesized name since v1+ plugins require one. Closes esengine#3949 Co-authored-by: reasonix <reasonix@deepseek.com>
…sengine#3961) will-change: transform plus a fill-mode that keeps the final transform applied left the settings/history/trash modal permanently composited, rasterized at the entrance-animation scale. On Windows fractional DPI the whole panel rendered blurry until a tab switch or resize forced a repaint. Let the entrance animation end with no retained style so the modal returns to the normal raster path. Closes esengine#3838 Closes esengine#3902 Closes esengine#3899 Co-authored-by: reasonix <reasonix@deepseek.com>
…sengine#3963) The darwin chrome blanket-marks the whole tabbar no-drag and leaves only a 10px rail at the very top as the drag region, so dragging the visible titlebar area selects text instead of moving the window. Re-enable drag on the spacer between the last tab and the search button and stretch it to the strip height -- the area every user actually grabs. Closes esengine#3853 Closes esengine#3852 Co-authored-by: reasonix <reasonix@deepseek.com>
…-resume maintenance (esengine#3968) * feat(agent): prune stale tool results before folding and on cold resume Stale tool results are re-derivable (files re-read, commands re-run), so eliding them is a free, lossless alternative to the paid summarize fold. Prune runs only where a cache reset is already being paid: at the compact trigger, where it skips the fold entirely when eliding alone clears the threshold, and on resume after the provider prefix cache has expired (cacheColdAfter), where rewriting history costs no extra misses and directly shrinks the full-price first request. No message is ever removed, so tool_call/result pairing and signed reasoning stay intact by construction; originals are archived like fold drops. * bench(e2e): context-maintenance driver — cold-resume A/B + placeholder comprehension Three real-API scenarios for the prune work: seed two identical-shape fat sessions, A/B the cold-restart miss tokens with and without pruning after the provider cache has expired, and verify the model re-reads a file behind a prune placeholder instead of answering from nothing. * fix(control): widen cold-resume prune threshold to a risk-asymmetric default Pruning a still-cached session costs ~4x the miss tokens of leaving it alone (measured warm-cache A/B), while a threshold that is too large only forgoes a free prune. Default to 24h until the cache-ttl probe pins the real retention; never set below it. * bench(e2e): check write/unmarshal errors in the maintenance driver (errcheck) * fix(control): derive cold-resume idle from branch meta only (CodeQL go/path-injection) The os.Stat mtime fallback fed a user-influenced path straight into a filesystem call. Branch meta is guaranteed for every session the controller has snapshotted, so the fallback only ever covered never-saved imports — those now skip one prune until their first snapshot creates the meta. --------- Co-authored-by: reasonix <reasonix@deepseek.com>
…ch kbd (esengine#3964) Follow-up to esengine#3962: the removed legacy block was the only carrier of the .palette__item:focus-visible outline, and the detached chrome search button (rendered on Windows/Linux only) was hardcoded to the mac glyph. Closes esengine#3841 Co-authored-by: reasonix <reasonix@deepseek.com>
…ispatches (esengine#3951) * fix(chat): keep each tool's marker under its own card in back-to-back dispatches When the model dispatched two Bash tools in quick succession, late ToolProgress chunks for the first tool no longer matched the current toolStreamID, so streamToolOutput fell through to the generic collapse-then-append path. The fresh live block landed at the tail of the transcript under the second tool's card, and the first tool's collapsed ⎿ marker stacked beneath the second card as well — making the two runs visually indistinguishable. The fix threads the slot beginToolRunning recorded for the first dispatch through shellTranscriptIdx. When a late progress (or late result) for that id arrives, streamToolOutput and collapseToolOutput now reuse the recorded slot instead of appending, so each tool's live block and final summary stay directly under its own card regardless of dispatch/progress arrival order. Adds TestConsecutiveToolCallsKeepMarkersUnderOwnCard to lock the behaviour in: it verifies both markers are present and that the first card's marker previews the full output, not just the chunk visible before the second tool took over. * fix(chat): clear the slot for back-to-back non-shell tools The back-to-back dispatch fix in 8b1dcd7 records every dispatched id in shellTranscriptIdx, so a late ToolResult for an earlier tool can land in the correct slot. But for non-shell-prefixed tools (e.g. read_file dispatched in parallel) the streaming state belongs to the current id and the shellOutputs accumulator is never populated, so the late path in collapseShellSlot computed n = -1 and rendered the final else branch as '⎿ -1 lines'. The visual result was two negative-count markers stacked at the end of the transcript, one per parallel tool. Guard n < 0 in collapseShellSlot by treating the unknown as zero output: clear the slot rather than fabricate a count. The id stays recorded in shellTranscriptIdx (set by beginToolRunning) so a late ToolProgress that finally arrives for that id can still extend the slot in place. Add TestConsecutiveNonShellToolsDoNotRenderNegativeLineCount, a minimal regression covering the read_file/read_file case the reviewer flagged: two back-to-back dispatches, late ToolResult for the first one. Verified by stashing the chat_tui.go change and rerunning — the test fails with the exact '-1 lines' transcript the user reported.
Follow-up to esengine#3951: the late-result branch of collapseToolOutput mutated the transcript without setting transcriptDirty, so the rewritten slot waited for the next unrelated event to paint. Set the flag inside collapseShellSlot, covering both callers. Co-authored-by: reasonix <reasonix@deepseek.com>
…sengine#3978) * feat(desktop): user-initiated crash reporting via crash.reasonix.io The crash overlay could only ask users to screenshot or copy the error, so most reports never reached us. Add an opt-in-per-click "Send report" button wired to a new ReportCrash binding that scrubs user names from paths, attaches version/GOOS/GOARCH, and POSTs to a Cloudflare Worker (workers/crash-report): D1-backed, fingerprint-deduped (5 raw samples per group), per-IP rate limited. Nothing is ever sent automatically. * chore(workers): pin crash-report D1 database_id from initial deploy --------- Co-authored-by: reasonix <reasonix@deepseek.com>
- 移除重复的 palette CSS 规则集,统一使用主样式 - 为每个命令添加独立图标(SquarePen、History、Trash2 等) - 会话项显示工作空间路径、相对日期和对话轮数 - 轮数使用 tabular-nums 等宽对齐 - hint 改为 flex 布局,路径/日期/轮数各占一位
…ommand-string drift (esengine#3982) * fix(evidence): match paraphrased verification commands and guide complete_step self-correction complete_step rejected real verifications whenever the cited command string was not byte-identical to the bash receipt: a dropped cd prefix (esengine#2917), a flag or quote-style drift, or a piped tail all failed both the ledger match and the esengine#3587 session fallback's prefix matching. Local session forensics show 5 of 18 real complete_step calls rejected this way, each cascading into todo_write failures and final answers that overclaim. Match commands by shell segment instead: split cited and ran commands on &&/||/;/|/newlines, quote-strip and whitespace-normalize tokens, and accept a cited segment when a ran segment equals it or supersets its tokens under the same head token. One-token citations still require exact equality, and an aggregated citation that no single command covers is still rejected. The session fallback now uses the same matcher and skips calls whose recorded result is an error or block, closing the false positive where any attempted command counted as proof. Rejections now carry recovery context: ran-but-nonzero commands are distinguished from never-ran (with a '|| true' hint for negative verification, e.g. proving a file is gone), never-ran rejections list the turn's actual receipts, and the schema marks command/paths as required for their kinds instead of advertising them optional. * fix(evidence): count bash commands naming a path as files receipts Files created or edited through shell redirection (seq … > file, sed -i) leave no reader/writer receipt, so files evidence for them was always rejected and the model had to re-write the file with write_file just to mint a receipt. A successful bash command whose text names the path now counts as having touched it. --------- Co-authored-by: reasonix <reasonix@deepseek.com>
…sengine#3985) Crash reports alone answer "what broke" but not "for whom" or "how many are even running this". Three additions, all on the existing worker: - Launch ping: one POST per start with a random install id + version + OS facts. Gated on new desktop.telemetry config (default on, toggle in Settings > Updates, disclosed in desktop/README.md); dev builds never ping. Carries no conversation, key, or file data. - Crash reports now attach coarse device facts (OS version, CPU model, cores, RAM) so "only crashes on X" patterns are visible. - Worker grows /v1/ping (per-day install dedup with opens counter) and a Basic-auth /stats page (daily actives, version/platform breakdown, recent crash groups) so day-to-day reading needs no SQL. Co-authored-by: reasonix <reasonix@deepseek.com>
…animation (esengine#3991) De-card tool/reasoning/step rows so the transcript reads as one quiet column: shared reasoning__head fold line, ChevronRight everywhere, long commands truncate instead of wrapping, and running rows get a text-clip shimmer sweep instead of a block background. Adds a transcript display mode setting (standard/compact/minimal) persisted in reasonix.toml and hydrated into the frontend at boot so config stays the source of truth. Co-authored-by: wufengfan <wufengfan@wufengfandeMacBook-Air.local>
Prompt only for active model keys / 仅提示活跃模型密钥
Co-authored-by: reasonix <reasonix@deepseek.com>
…up detail pages (esengine#3997) Co-authored-by: reasonix <reasonix@deepseek.com>
…ds, 0s label, thinking flicker) (esengine#4000) * fix(desktop): keep step folds in the centered transcript column .turn-collapse and .readonly-batch set margin: Npx 0, which outranks the .transcript > * auto centering by source order, so compact-mode folds rendered flush against the pane's left edge. * fix(desktop): make compact-mode step folds honest and calm Processed folds counted reasoning-only assistants as content, so minimal mode produced expandable folds over an empty body, and sub-second batches labeled themselves 0s. Filter to items the body actually renders (hiding the fold when nothing survives), drop the seconds suffix below 1s, and thread subcalls so nested tool calls show inside the fold. Streaming reasoning no longer auto-expands in compact/minimal — steps fold away on completion, so auto-open read as open/close flicker. --------- Co-authored-by: reasonix <reasonix@deepseek.com>
- 命令项改为 4-5 列网格布局,紧凑显示图标+标题 - 会话项保持列表布局,显示路径、日期、轮数
* fix: improve desktop context status metrics * fix: address context metrics review feedback
… todos (esengine#4006) * fix(evidence): tolerate citation drift when matching complete_step to todos The todo-step matcher demanded byte-exact (case-folded) equality between complete_step.step and a todo's text, so a fullwidth/halfwidth colon or whitespace drift ("Phase 5:…" cited as "Phase 5: …") could never match and the model looped on "no matching todo_write item" retries, burning tokens (discussion esengine#3970). Same disease esengine#3982 cured for command citations, different limb. Normalize both sides (fullwidth ASCII → halfwidth, whitespace dropped, case-folded) before comparing, fall back to unique substring containment (≥6 runes; ambiguous citations stay unmatched), and list this turn's todos in the rejection so the model can self-correct by verbatim content or index instead of guessing. * style: gofmt evidence_test (CJK-width map alignment) --------- Co-authored-by: reasonix <reasonix@deepseek.com>
…e#3994) Stale or empty model refs fell back to the first configured provider, ignoring the user's default_model. Try default_model (when keyed) before iterating providers. Closes esengine#3801
…sengine#4010) Parallel call_N-style bash tools lost their per-call line count: streamToolOutput reset the active id's count on every switch and collapseShellSlot's late path only recovered it from shellOutputs (shell- ids only), so call_N ids rendered '-1 lines'. Stash the per-id count before reset and accept the ToolResult output as a last-resort source. Closes esengine#4003
…gine#4014) Per-turn evidence ledger reset made complete_step reject cross-turn citations and let the final gate miss an unfinished plan. diff/files evidence now falls back to the full session (like commands, esengine#3587); the host keeps a canonical todo list (survives turns + compaction) the gate consults; a successful complete_step advances that list so the model no longer batches todo_write (esengine#3909). Real-API A/B confirmed base rejects/blocks where the PR accepts/advances. Closes esengine#2917
…sengine#3892) Add Feishu/Lark + WeChat connection flows with status management, diagnostics, and scoped connection persistence (env-name credentials, no raw secrets). Surface IM-origin conversations in the desktop UI with sidebar management and connection details; per-platform session model/workspace routing in the gateway.
esengine#4022) * fix(openai): round-trip reasoning_content on DeepSeek tool_calls turns DeepSeek's thinking mode now rejects an assistant tool_calls turn whose reasoning_content was dropped on replay (400 "reasoning_content … must be passed back to the API"). The provider stripped it unconditionally, so any cache-miss replay of a tool-calling history — session resume, compaction, or a turn after the prompt cache expires — 400s, while warm consecutive turns are tolerated because DeepSeek still holds the reasoning server-side. Round reasoning_content back, but only on the assistant turn that carries tool calls and only for the DeepSeek protocol: a plain assistant text turn does not require it, and other backends still bill it as input for nothing. Reasoning enters the cached prefix and is reused on later turns, so the cost is one miss per chain, not a cache collapse. * feat(agent): add finish_reason diagnostics to the empty-final notice The empty-final warning carried no context, so reports couldn't tell its three causes apart: a length-truncated turn, a reasoning-only stop, or a provider that swallowed the answer. Include the provider, the finish reason, and the reasoning length so each occurrence is self-diagnosing. --------- Co-authored-by: reasonix <reasonix@deepseek.com>
fix(desktop): 优化命令面板样式与信息展示
…sengine#4023) The rule :root[data-theme-style] .msg--user .msg__body (with no specific style value) leaked graphite's inverted-color bubble (--fg as background, --bg as text) onto all other themes like carbon, aurora, and slate. Fix: narrow both selectors to [data-theme-style="graphite"]. Co-authored-by: wufengfan <wufengfan@wufengfandeMacBook-Air.local>
…sengine#4030) Adds the worker side of the desktop opt-in metrics flush: a generic (date, version, os, signal, bucket) -> count table, a zod-validated /v1/metrics POST that upserts per-launch aggregate snapshots, a dedicated rate-limit bucket, and a stats-page section that breaks each signal down by bucket. The signal set is an enum and the bucket is length+charset capped, so the table can never be polluted with arbitrary keys. No install id and no content reach this endpoint — only enumerated counters. Deploy is owner-run: apply schema.sql to D1, then wrangler deploy. Co-authored-by: reasonix <reasonix@deepseek.com>
…t/minimal mode (esengine#4031) Three render paths were rendering read-only tools (read_file, grep, glob, web_fetch, ls) as individual tool cards even after they completed: 1. Hot zone final step (compact/minimal) — running tools render individually, completed tools batch into ReadOnlyBatch. 2. TurnCollapse body (compact/minimal folded steps) — same batching logic added to the pre-computed body array. 3. WarmTurnItems (expanded history turns) — new roBatch/flushRO loop added so warm read-only tools are also grouped. This keeps running read-only tools visible (with shimmer animation) and hides completed ones behind a compact "Read N files · Search N files" fold line, consistent across all compact/minimal display modes. Co-authored-by: wufengfan <wufengfan@wufengfandeMacBook-Air.local>
…esengine#4032) WarmTurnItems batched read-only tools without the status guard the hot-zone and TurnCollapse paths use. Add it for consistency so an interrupted running tool lingering in history renders individually instead of folding into the batch. Drop a stray blank line. Follow-up to esengine#4031. Co-authored-by: reasonix <reasonix@deepseek.com>
…ngine#4033) Desktop side of the opt-in metrics telemetry (worker /v1/metrics already live). A metricsAggregator taps the existing tabEventSink event stream — internal/agent is untouched, so the CLI stays zero-egress — and counts enumerated facts only: finish_reason, empty_final, provider error class, cache-hit bucket, tool error class, compaction, and turns. Never message text, keys, prompts, or paths; error classes are derived from a status-code regex, not the message body. Gated on a new desktop.metrics flag (default off, separate from the default-on launch ping) and skipped in dev. Counts persist per turn and flush once at the next launch, mirroring the ping; a failed POST folds back to retry. The settings toggle and first-run disclosure land in a follow-up; until then the flag is set via config.toml. Co-authored-by: reasonix <reasonix@deepseek.com>
Surfaces the desktop.metrics opt-in in Settings, mirroring the telemetry ping wiring end to end: SettingsView field, the SetDesktopMetrics bound method, the bridge AppBindings + dev mock, and the panel toggle with a disclosure hint (en + zh). The hint is the disclosure — it spells out that only enumerated counters ship, never conversations, prompts, keys, or paths. SetDesktopMetrics starts/stops the aggregator live so the toggle takes effect immediately, and a.metrics becomes an atomic.Pointer so the event sink reads it race-free while the setting flips mid-session. Co-authored-by: reasonix <reasonix@deepseek.com>
…engine#4036) The marketing site bakes the version at build time from R2 latest.json, and pages.yml only rebuilds on site/** pushes — so a release left the static value (and JSON-LD softwareVersion) lagging until the next site edit. site.js's runtime .rxv refresh fixes the live page but not first paint / SEO. Dispatch pages.yml from the mirror job's stable path, once R2 latest/ has actually moved. Co-authored-by: reasonix <reasonix@deepseek.com>
…esengine#4048) Compaction folded the first user turn — the task brief and the user's stated facts and constraints — into a model-written summary, and a later fold then re-summarized that summary, degrading it to nothing and dropping the user's facts from context permanently. A real-provider run reproduced it: a token stated up front vanished from every request after the second fold, and the agent could no longer recall it. planCompaction now pins a small first user turn and any prior summaries in the verbatim prefix, so a fold never summarizes the task away and never re-folds an earlier summary. A large first turn (pasted content) stays foldable, capped by an absolute token ceiling and a window fraction, so pinning never starves the context. Co-authored-by: reasonix <reasonix@deepseek.com>
…digest (esengine#4052) Builds on the first-turn pin (esengine#4048): the deterministic floor now covers a fact the user states at ANY point, not just the opening turn. Compaction keeps every small user turn verbatim and folds only the assistant/tool work, so a mid-session "always deploy to eu-west-3" survives regardless of how the summarizer behaves. On top of that floor, the digest now leads with a structured "Standing facts & constraints" section consolidating what the user stated into one tidy view — redundant with the verbatim turns by design, so a weak summarizer dropping a fact there loses nothing. Co-authored-by: reasonix <reasonix@deepseek.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
…-guidance fix(agent): clarify subagent continuation guidance
…overflow (esengine#4079) A turn that ends with a final answer (no trailing tool batch) skipped compaction entirely — maybeCompact ran only after tool batches and in the retry paths. So a large context carried into the next turn un-folded, and across a multi-turn session it accumulated until the next request exceeded the model's hard context limit and the provider returned 400, breaking the session. Compact at the end of the final-answer path too. It is a no-op below the trigger, so normal turns keep their warm cache; it folds only when the context is already over the threshold — exactly when the next turn would risk overflow. Co-authored-by: reasonix <reasonix@deepseek.com>
…he limit (esengine#4082) A compaction digest was eligible to be re-summarized by the next fold, so a fact it had captured could be dropped to summary-of-summary drift. Keep prior digests verbatim alongside the kept user turns and summarize only the new foldable work; digests accumulate (small) instead of collapsing into one lossy rolling summary. Document the guarantee in SPEC 3.6: a fact stated in a normal turn, and a fact a digest captured, survive any number of folds; a fact buried inside a single oversized message stays best-effort — there is no reliable way to auto-detect an arbitrary fact in bulk, so durable facts belong in their own turn. Co-authored-by: reasonix <reasonix@deepseek.com>
… and adaptive width - Simplified ModelSwitcher to match upstream (removed closing animation) - Added search input with auto-focus for filtering models by name/provider - Grouped models by provider with accent-colored group labels and Brain icon - Adjusted width to adapt to content (min triggerWidth, max min(90vw, 480px)) - Added max-height: 50vh with scroll for large model lists
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f2645103a5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| > | ||
| <Brain size={13} className="modelsw__kind" /> | ||
| <span className="modelsw__label">{label}</span> | ||
| <span className="modelsw__label">{label}{currentProvider ? ` · ${currentProvider}` : ''}</span> |
There was a problem hiding this comment.
Refresh provider label after switching models
When a user opens the switcher and selects a model from a different provider, pick() closes the menu and calls onPick, but the models array still contains the old current flag until the menu is opened again. Since this trigger now renders the provider from that stale currentProvider, the parent can update label to the newly selected model while this suffix still shows the previous provider, e.g. mimo-v2.5-pro · DeepSeek. Derive the provider from the selected ref/label or refresh/update models after a successful pick so the trigger cannot mix new model text with the old provider.
Useful? React with 👍 / 👎.
f264510 to
9bc7b70
Compare
在 ModelSwitcher 触发按钮上,模型名后加 · 服务商名称,例如:
deepseek-v4-flash · DeepSeek\n\n依赖 esengine#4139(pr/model-list 基础优化)