Skip to content

feat(desktop): 触发按钮显示模型服务商#5

Closed
ttmouse wants to merge 53 commits into
main-v2from
pr/model-trigger-label
Closed

feat(desktop): 触发按钮显示模型服务商#5
ttmouse wants to merge 53 commits into
main-v2from
pr/model-trigger-label

Conversation

@ttmouse

@ttmouse ttmouse commented Jun 12, 2026

Copy link
Copy Markdown
Owner

在 ModelSwitcher 触发按钮上,模型名后加 · 服务商名称,例如:deepseek-v4-flash · DeepSeek\n\n依赖 esengine#4139(pr/model-list 基础优化)

esengine and others added 30 commits June 11, 2026 14:04
…#3953) (esengine#3957)

A proxy that idle-closes the SSE connection with a clean FIN ends the
scan loop with no error, so the turn was committed as complete -- with
whatever fraction of the tool-call arguments had streamed. DeepSeek then
rejects every subsequent request that replays the truncated JSON with
HTTP 400, bricking the session.

Two layers:
- readStream now requires [DONE] or a finish_reason; a clean EOF
  without either is a connection cut, which the existing replay /
  StreamInterrupted recovery handles (partial calls are never emitted).
- SanitizeToolPairing closes truncated argument JSON before requests
  are built, so sessions already poisoned by this bug resume working.

Co-authored-by: reasonix <reasonix@deepseek.com>
… dual-model sessions (esengine#3958)

In planner+executor mode the executor session opens with the handoff
boilerplate, so every session preview and auto-generated title collapsed
to 'Reasonix executor'. HandoffTask extracts the embedded original task;
the session picker, serve preview, and desktop topic titles all use it.

Closes esengine#3860

Co-authored-by: reasonix <reasonix@deepseek.com>
…ig.json (esengine#3960)

v0.x stored MCP servers in two shapes: the canonical mcpServers map and
the older `mcp` list of --mcp-format strings (name=cmd args, name=URL
for SSE, name=streamable+URL), with mcpEnv/mcpDisabled keyed by name.
Both the one-time migration and the live lowest-priority legacy source
only read mcpServers, so servers configured the old way -- including the
default memory server -- silently vanished on upgrade.

Parse the string list in both paths; mcpServers wins a name collision,
matching the v0.x merge order. Anonymous specs get a synthesized name
since v1+ plugins require one.

Closes esengine#3949

Co-authored-by: reasonix <reasonix@deepseek.com>
…sengine#3961)

will-change: transform plus a fill-mode that keeps the final transform
applied left the settings/history/trash modal permanently composited,
rasterized at the entrance-animation scale. On Windows fractional DPI
the whole panel rendered blurry until a tab switch or resize forced a
repaint. Let the entrance animation end with no retained style so the
modal returns to the normal raster path.

Closes esengine#3838
Closes esengine#3902
Closes esengine#3899

Co-authored-by: reasonix <reasonix@deepseek.com>
…sengine#3963)

The darwin chrome blanket-marks the whole tabbar no-drag and leaves only
a 10px rail at the very top as the drag region, so dragging the visible
titlebar area selects text instead of moving the window. Re-enable drag
on the spacer between the last tab and the search button and stretch it
to the strip height -- the area every user actually grabs.

Closes esengine#3853
Closes esengine#3852

Co-authored-by: reasonix <reasonix@deepseek.com>
…-resume maintenance (esengine#3968)

* feat(agent): prune stale tool results before folding and on cold resume

Stale tool results are re-derivable (files re-read, commands re-run), so
eliding them is a free, lossless alternative to the paid summarize fold.
Prune runs only where a cache reset is already being paid: at the compact
trigger, where it skips the fold entirely when eliding alone clears the
threshold, and on resume after the provider prefix cache has expired
(cacheColdAfter), where rewriting history costs no extra misses and
directly shrinks the full-price first request. No message is ever removed,
so tool_call/result pairing and signed reasoning stay intact by
construction; originals are archived like fold drops.

* bench(e2e): context-maintenance driver — cold-resume A/B + placeholder comprehension

Three real-API scenarios for the prune work: seed two identical-shape fat
sessions, A/B the cold-restart miss tokens with and without pruning after
the provider cache has expired, and verify the model re-reads a file behind
a prune placeholder instead of answering from nothing.

* fix(control): widen cold-resume prune threshold to a risk-asymmetric default

Pruning a still-cached session costs ~4x the miss tokens of leaving it
alone (measured warm-cache A/B), while a threshold that is too large only
forgoes a free prune. Default to 24h until the cache-ttl probe pins the
real retention; never set below it.

* bench(e2e): check write/unmarshal errors in the maintenance driver (errcheck)

* fix(control): derive cold-resume idle from branch meta only (CodeQL go/path-injection)

The os.Stat mtime fallback fed a user-influenced path straight into a
filesystem call. Branch meta is guaranteed for every session the controller
has snapshotted, so the fallback only ever covered never-saved imports —
those now skip one prune until their first snapshot creates the meta.

---------

Co-authored-by: reasonix <reasonix@deepseek.com>
…ch kbd (esengine#3964)

Follow-up to esengine#3962: the removed legacy block was the only carrier of the
.palette__item:focus-visible outline, and the detached chrome search
button (rendered on Windows/Linux only) was hardcoded to the mac glyph.

Closes esengine#3841

Co-authored-by: reasonix <reasonix@deepseek.com>
…ispatches (esengine#3951)

* fix(chat): keep each tool's marker under its own card in back-to-back dispatches

When the model dispatched two Bash tools in quick succession, late
ToolProgress chunks for the first tool no longer matched the current
toolStreamID, so streamToolOutput fell through to the generic
collapse-then-append path. The fresh live block landed at the tail of
the transcript under the second tool's card, and the first tool's
collapsed ⎿ marker stacked beneath the second card as well — making
the two runs visually indistinguishable.

The fix threads the slot beginToolRunning recorded for the first
dispatch through shellTranscriptIdx. When a late progress (or late
result) for that id arrives, streamToolOutput and collapseToolOutput
now reuse the recorded slot instead of appending, so each tool's live
block and final summary stay directly under its own card regardless
of dispatch/progress arrival order.

Adds TestConsecutiveToolCallsKeepMarkersUnderOwnCard to lock the
behaviour in: it verifies both markers are present and that the first
card's marker previews the full output, not just the chunk visible
before the second tool took over.

* fix(chat): clear the slot for back-to-back non-shell tools

The back-to-back dispatch fix in 8b1dcd7 records every dispatched
id in shellTranscriptIdx, so a late ToolResult for an earlier tool
can land in the correct slot. But for non-shell-prefixed tools
(e.g. read_file dispatched in parallel) the streaming state belongs
to the current id and the shellOutputs accumulator is never
populated, so the late path in collapseShellSlot computed n = -1
and rendered the final else branch as '⎿ -1 lines'. The visual
result was two negative-count markers stacked at the end of the
transcript, one per parallel tool.

Guard n < 0 in collapseShellSlot by treating the unknown as zero
output: clear the slot rather than fabricate a count. The id stays
recorded in shellTranscriptIdx (set by beginToolRunning) so a late
ToolProgress that finally arrives for that id can still extend the
slot in place.

Add TestConsecutiveNonShellToolsDoNotRenderNegativeLineCount, a
minimal regression covering the read_file/read_file case the
reviewer flagged: two back-to-back dispatches, late ToolResult
for the first one. Verified by stashing the chat_tui.go change
and rerunning — the test fails with the exact '-1 lines' transcript
the user reported.
Follow-up to esengine#3951: the late-result branch of collapseToolOutput
mutated the transcript without setting transcriptDirty, so the
rewritten slot waited for the next unrelated event to paint. Set the
flag inside collapseShellSlot, covering both callers.

Co-authored-by: reasonix <reasonix@deepseek.com>
…sengine#3978)

* feat(desktop): user-initiated crash reporting via crash.reasonix.io

The crash overlay could only ask users to screenshot or copy the error,
so most reports never reached us. Add an opt-in-per-click "Send report"
button wired to a new ReportCrash binding that scrubs user names from
paths, attaches version/GOOS/GOARCH, and POSTs to a Cloudflare Worker
(workers/crash-report): D1-backed, fingerprint-deduped (5 raw samples
per group), per-IP rate limited. Nothing is ever sent automatically.

* chore(workers): pin crash-report D1 database_id from initial deploy

---------

Co-authored-by: reasonix <reasonix@deepseek.com>
- 移除重复的 palette CSS 规则集,统一使用主样式
- 为每个命令添加独立图标(SquarePen、History、Trash2 等)
- 会话项显示工作空间路径、相对日期和对话轮数
- 轮数使用 tabular-nums 等宽对齐
- hint 改为 flex 布局,路径/日期/轮数各占一位
…ommand-string drift (esengine#3982)

* fix(evidence): match paraphrased verification commands and guide complete_step self-correction

complete_step rejected real verifications whenever the cited command string
was not byte-identical to the bash receipt: a dropped cd prefix (esengine#2917), a
flag or quote-style drift, or a piped tail all failed both the ledger match
and the esengine#3587 session fallback's prefix matching. Local session forensics
show 5 of 18 real complete_step calls rejected this way, each cascading into
todo_write failures and final answers that overclaim.

Match commands by shell segment instead: split cited and ran commands on
&&/||/;/|/newlines, quote-strip and whitespace-normalize tokens, and accept a
cited segment when a ran segment equals it or supersets its tokens under the
same head token. One-token citations still require exact equality, and an
aggregated citation that no single command covers is still rejected. The
session fallback now uses the same matcher and skips calls whose recorded
result is an error or block, closing the false positive where any attempted
command counted as proof.

Rejections now carry recovery context: ran-but-nonzero commands are
distinguished from never-ran (with a '|| true' hint for negative
verification, e.g. proving a file is gone), never-ran rejections list the
turn's actual receipts, and the schema marks command/paths as required for
their kinds instead of advertising them optional.

* fix(evidence): count bash commands naming a path as files receipts

Files created or edited through shell redirection (seq … > file, sed -i)
leave no reader/writer receipt, so files evidence for them was always
rejected and the model had to re-write the file with write_file just to
mint a receipt. A successful bash command whose text names the path now
counts as having touched it.

---------

Co-authored-by: reasonix <reasonix@deepseek.com>
…sengine#3985)

Crash reports alone answer "what broke" but not "for whom" or "how many
are even running this". Three additions, all on the existing worker:

- Launch ping: one POST per start with a random install id + version +
  OS facts. Gated on new desktop.telemetry config (default on, toggle in
  Settings > Updates, disclosed in desktop/README.md); dev builds never
  ping. Carries no conversation, key, or file data.
- Crash reports now attach coarse device facts (OS version, CPU model,
  cores, RAM) so "only crashes on X" patterns are visible.
- Worker grows /v1/ping (per-day install dedup with opens counter) and a
  Basic-auth /stats page (daily actives, version/platform breakdown,
  recent crash groups) so day-to-day reading needs no SQL.

Co-authored-by: reasonix <reasonix@deepseek.com>
…animation (esengine#3991)

De-card tool/reasoning/step rows so the transcript reads as one quiet
column: shared reasoning__head fold line, ChevronRight everywhere, long
commands truncate instead of wrapping, and running rows get a text-clip
shimmer sweep instead of a block background. Adds a transcript display
mode setting (standard/compact/minimal) persisted in reasonix.toml and
hydrated into the frontend at boot so config stays the source of truth.

Co-authored-by: wufengfan <wufengfan@wufengfandeMacBook-Air.local>
Prompt only for active model keys / 仅提示活跃模型密钥
Co-authored-by: reasonix <reasonix@deepseek.com>
…up detail pages (esengine#3997)

Co-authored-by: reasonix <reasonix@deepseek.com>
…ds, 0s label, thinking flicker) (esengine#4000)

* fix(desktop): keep step folds in the centered transcript column

.turn-collapse and .readonly-batch set margin: Npx 0, which outranks
the .transcript > * auto centering by source order, so compact-mode
folds rendered flush against the pane's left edge.

* fix(desktop): make compact-mode step folds honest and calm

Processed folds counted reasoning-only assistants as content, so
minimal mode produced expandable folds over an empty body, and
sub-second batches labeled themselves 0s. Filter to items the body
actually renders (hiding the fold when nothing survives), drop the
seconds suffix below 1s, and thread subcalls so nested tool calls
show inside the fold. Streaming reasoning no longer auto-expands in
compact/minimal — steps fold away on completion, so auto-open read
as open/close flicker.

---------

Co-authored-by: reasonix <reasonix@deepseek.com>
- 命令项改为 4-5 列网格布局,紧凑显示图标+标题
- 会话项保持列表布局,显示路径、日期、轮数
* fix: improve desktop context status metrics

* fix: address context metrics review feedback
… todos (esengine#4006)

* fix(evidence): tolerate citation drift when matching complete_step to todos

The todo-step matcher demanded byte-exact (case-folded) equality between
complete_step.step and a todo's text, so a fullwidth/halfwidth colon or
whitespace drift ("Phase 5:…" cited as "Phase 5: …") could never match
and the model looped on "no matching todo_write item" retries, burning
tokens (discussion esengine#3970). Same disease esengine#3982 cured for command
citations, different limb.

Normalize both sides (fullwidth ASCII → halfwidth, whitespace dropped,
case-folded) before comparing, fall back to unique substring containment
(≥6 runes; ambiguous citations stay unmatched), and list this turn's
todos in the rejection so the model can self-correct by verbatim content
or index instead of guessing.

* style: gofmt evidence_test (CJK-width map alignment)

---------

Co-authored-by: reasonix <reasonix@deepseek.com>
…e#3994)

Stale or empty model refs fell back to the first configured provider, ignoring the user's default_model. Try default_model (when keyed) before iterating providers.

Closes esengine#3801
…sengine#4010)

Parallel call_N-style bash tools lost their per-call line count: streamToolOutput reset the active id's count on every switch and collapseShellSlot's late path only recovered it from shellOutputs (shell- ids only), so call_N ids rendered '-1 lines'. Stash the per-id count before reset and accept the ToolResult output as a last-resort source.

Closes esengine#4003
…gine#4014)

Per-turn evidence ledger reset made complete_step reject cross-turn citations and let the final gate miss an unfinished plan. diff/files evidence now falls back to the full session (like commands, esengine#3587); the host keeps a canonical todo list (survives turns + compaction) the gate consults; a successful complete_step advances that list so the model no longer batches todo_write (esengine#3909). Real-API A/B confirmed base rejects/blocks where the PR accepts/advances.

Closes esengine#2917
…sengine#3892)

Add Feishu/Lark + WeChat connection flows with status management, diagnostics, and scoped connection persistence (env-name credentials, no raw secrets). Surface IM-origin conversations in the desktop UI with sidebar management and connection details; per-platform session model/workspace routing in the gateway.
esengine#4022)

* fix(openai): round-trip reasoning_content on DeepSeek tool_calls turns

DeepSeek's thinking mode now rejects an assistant tool_calls turn whose
reasoning_content was dropped on replay (400 "reasoning_content … must be
passed back to the API"). The provider stripped it unconditionally, so any
cache-miss replay of a tool-calling history — session resume, compaction,
or a turn after the prompt cache expires — 400s, while warm consecutive
turns are tolerated because DeepSeek still holds the reasoning server-side.

Round reasoning_content back, but only on the assistant turn that carries
tool calls and only for the DeepSeek protocol: a plain assistant text turn
does not require it, and other backends still bill it as input for nothing.
Reasoning enters the cached prefix and is reused on later turns, so the
cost is one miss per chain, not a cache collapse.

* feat(agent): add finish_reason diagnostics to the empty-final notice

The empty-final warning carried no context, so reports couldn't tell its
three causes apart: a length-truncated turn, a reasoning-only stop, or a
provider that swallowed the answer. Include the provider, the finish
reason, and the reasoning length so each occurrence is self-diagnosing.

---------

Co-authored-by: reasonix <reasonix@deepseek.com>
SivanCola and others added 15 commits June 11, 2026 19:24
fix(desktop): 优化命令面板样式与信息展示
…sengine#4023)

The rule :root[data-theme-style] .msg--user .msg__body (with no specific
style value) leaked graphite's inverted-color bubble (--fg as background,
--bg as text) onto all other themes like carbon, aurora, and slate.

Fix: narrow both selectors to [data-theme-style="graphite"].

Co-authored-by: wufengfan <wufengfan@wufengfandeMacBook-Air.local>
…sengine#4030)

Adds the worker side of the desktop opt-in metrics flush: a generic
(date, version, os, signal, bucket) -> count table, a zod-validated
/v1/metrics POST that upserts per-launch aggregate snapshots, a dedicated
rate-limit bucket, and a stats-page section that breaks each signal down
by bucket.

The signal set is an enum and the bucket is length+charset capped, so the
table can never be polluted with arbitrary keys. No install id and no
content reach this endpoint — only enumerated counters.

Deploy is owner-run: apply schema.sql to D1, then wrangler deploy.

Co-authored-by: reasonix <reasonix@deepseek.com>
…t/minimal mode (esengine#4031)

Three render paths were rendering read-only tools (read_file, grep, glob,
web_fetch, ls) as individual tool cards even after they completed:

1. Hot zone final step (compact/minimal) — running tools render
   individually, completed tools batch into ReadOnlyBatch.
2. TurnCollapse body (compact/minimal folded steps) — same batching
   logic added to the pre-computed body array.
3. WarmTurnItems (expanded history turns) — new roBatch/flushRO loop
   added so warm read-only tools are also grouped.

This keeps running read-only tools visible (with shimmer animation) and
hides completed ones behind a compact "Read N files · Search N files"
fold line, consistent across all compact/minimal display modes.

Co-authored-by: wufengfan <wufengfan@wufengfandeMacBook-Air.local>
…esengine#4032)

WarmTurnItems batched read-only tools without the status guard the hot-zone and TurnCollapse paths use. Add it for consistency so an interrupted running tool lingering in history renders individually instead of folding into the batch. Drop a stray blank line. Follow-up to esengine#4031.

Co-authored-by: reasonix <reasonix@deepseek.com>
…ngine#4033)

Desktop side of the opt-in metrics telemetry (worker /v1/metrics already
live). A metricsAggregator taps the existing tabEventSink event stream —
internal/agent is untouched, so the CLI stays zero-egress — and counts
enumerated facts only: finish_reason, empty_final, provider error class,
cache-hit bucket, tool error class, compaction, and turns. Never message
text, keys, prompts, or paths; error classes are derived from a status-code
regex, not the message body.

Gated on a new desktop.metrics flag (default off, separate from the
default-on launch ping) and skipped in dev. Counts persist per turn and
flush once at the next launch, mirroring the ping; a failed POST folds back
to retry. The settings toggle and first-run disclosure land in a follow-up;
until then the flag is set via config.toml.

Co-authored-by: reasonix <reasonix@deepseek.com>
Surfaces the desktop.metrics opt-in in Settings, mirroring the telemetry
ping wiring end to end: SettingsView field, the SetDesktopMetrics bound
method, the bridge AppBindings + dev mock, and the panel toggle with a
disclosure hint (en + zh). The hint is the disclosure — it spells out that
only enumerated counters ship, never conversations, prompts, keys, or paths.

SetDesktopMetrics starts/stops the aggregator live so the toggle takes
effect immediately, and a.metrics becomes an atomic.Pointer so the event
sink reads it race-free while the setting flips mid-session.

Co-authored-by: reasonix <reasonix@deepseek.com>
…engine#4036)

The marketing site bakes the version at build time from R2 latest.json,
and pages.yml only rebuilds on site/** pushes — so a release left the
static value (and JSON-LD softwareVersion) lagging until the next site
edit. site.js's runtime .rxv refresh fixes the live page but not first
paint / SEO. Dispatch pages.yml from the mirror job's stable path, once
R2 latest/ has actually moved.

Co-authored-by: reasonix <reasonix@deepseek.com>
…esengine#4048)

Compaction folded the first user turn — the task brief and the user's stated
facts and constraints — into a model-written summary, and a later fold then
re-summarized that summary, degrading it to nothing and dropping the user's
facts from context permanently. A real-provider run reproduced it: a token
stated up front vanished from every request after the second fold, and the
agent could no longer recall it.

planCompaction now pins a small first user turn and any prior summaries in the
verbatim prefix, so a fold never summarizes the task away and never re-folds an
earlier summary. A large first turn (pasted content) stays foldable, capped by
an absolute token ceiling and a window fraction, so pinning never starves the
context.

Co-authored-by: reasonix <reasonix@deepseek.com>
…digest (esengine#4052)

Builds on the first-turn pin (esengine#4048): the deterministic floor now covers a fact
the user states at ANY point, not just the opening turn. Compaction keeps every
small user turn verbatim and folds only the assistant/tool work, so a mid-session
"always deploy to eu-west-3" survives regardless of how the summarizer behaves.

On top of that floor, the digest now leads with a structured "Standing facts &
constraints" section consolidating what the user stated into one tidy view —
redundant with the verbatim turns by design, so a weak summarizer dropping a fact
there loses nothing.

Co-authored-by: reasonix <reasonix@deepseek.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
…-guidance

fix(agent): clarify subagent continuation guidance
…overflow (esengine#4079)

A turn that ends with a final answer (no trailing tool batch) skipped
compaction entirely — maybeCompact ran only after tool batches and in the
retry paths. So a large context carried into the next turn un-folded, and
across a multi-turn session it accumulated until the next request exceeded the
model's hard context limit and the provider returned 400, breaking the session.

Compact at the end of the final-answer path too. It is a no-op below the
trigger, so normal turns keep their warm cache; it folds only when the context
is already over the threshold — exactly when the next turn would risk overflow.

Co-authored-by: reasonix <reasonix@deepseek.com>
…he limit (esengine#4082)

A compaction digest was eligible to be re-summarized by the next fold, so a
fact it had captured could be dropped to summary-of-summary drift. Keep prior
digests verbatim alongside the kept user turns and summarize only the new
foldable work; digests accumulate (small) instead of collapsing into one lossy
rolling summary.

Document the guarantee in SPEC 3.6: a fact stated in a normal turn, and a fact a
digest captured, survive any number of folds; a fact buried inside a single
oversized message stays best-effort — there is no reliable way to auto-detect an
arbitrary fact in bulk, so durable facts belong in their own turn.

Co-authored-by: reasonix <reasonix@deepseek.com>
ttmouse added 3 commits June 12, 2026 15:27
… and adaptive width

- Simplified ModelSwitcher to match upstream (removed closing animation)
- Added search input with auto-focus for filtering models by name/provider
- Grouped models by provider with accent-colored group labels and Brain icon
- Adjusted width to adapt to content (min triggerWidth, max min(90vw, 480px))
- Added max-height: 50vh with scroll for large model lists

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f2645103a5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

>
<Brain size={13} className="modelsw__kind" />
<span className="modelsw__label">{label}</span>
<span className="modelsw__label">{label}{currentProvider ? ` · ${currentProvider}` : ''}</span>

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Refresh provider label after switching models

When a user opens the switcher and selects a model from a different provider, pick() closes the menu and calls onPick, but the models array still contains the old current flag until the menu is opened again. Since this trigger now renders the provider from that stale currentProvider, the parent can update label to the newly selected model while this suffix still shows the previous provider, e.g. mimo-v2.5-pro · DeepSeek. Derive the provider from the selected ref/label or refresh/update models after a successful pick so the trigger cannot mix new model text with the old provider.

Useful? React with 👍 / 👎.

@ttmouse ttmouse force-pushed the pr/model-trigger-label branch from f264510 to 9bc7b70 Compare June 12, 2026 07:28
@ttmouse ttmouse changed the base branch from pr/model-list to main-v2 June 12, 2026 07:28
@ttmouse ttmouse closed this Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants