mcp: work on several updates in parallel (workspaces, background jobs, lock wait) by plusky · Pull Request #216 · openSUSE/mtui

plusky · 2026-06-22T15:51:31Z

Lets one client — or several agents — keep more than one update moving at once, without each load_template tearing the previous update's hosts down. Five commits.

1. Named workspaces

Every tool gains an optional workspace selector (default "default"). Each name resolves, via the existing SessionRegistry, to its own isolated McpSession (own loaded template, targets, per-session lock). Calls in different workspaces run concurrently, so a single stdio client can advance update B while update A's slow host op runs. stdio now uses the registry too (idle sweeper disabled). New tools: list_workspaces, close_workspace. The default workspace reproduces today's behaviour.

2. Async background jobs

Slow host commands (run, update, downgrade, prepare, install, uninstall, set_repo, reboot) take background=true: returns a job id immediately instead of holding the request open, running under the workspace lock. Poll with job_status, fetch with job_result (job_list / job_cancel too).

3. Wait for a busy refhost

Separate agents are separate processes, so a refhost lock genuinely excludes them. [lock] wait (seconds, default 0 = fail-fast) makes lock queue on a busy host — polling every [lock] wait_poll seconds until released/reaped/ours — instead of erroring. The connect-time warning is untouched; a warning is logged on wait-start/timeout so a REPL user still sees the host is busy.

4. Refhost pool selection

When refhosts.yml lists several interchangeable hosts for the same test target and [refhosts] pool_select is on, add_host connects just one free host per target instead of the whole matrix: tries candidates in turn, skips any locked by another agent, and claims (locks) the one it takes — so parallel agents draw distinct hosts. The target is the full product + version + arch + addons the update asks for, not just arch — so an update spanning all arches of SLE15-SP5 and SP7 still gets a host per (service-pack, arch); only genuine duplicates collapse to one. Searched across all locations (location ignored). Off by default.

Together: a pool gives several agents distinct hosts; lock wait makes them queue when the pool is exhausted; workspaces + background jobs keep several updates moving per agent/client.

Notes

Workspaces inside one process share the OS pid, so the (user+pid) refhost lock treats them as one owner — they never block each other on a host; coordinate same-process workspaces so they don't drive the same host destructively at once. [lock] wait and pool selection matter across separate processes/agents.
location is optional: with none configured/specified it defaults to the default bucket of refhosts.yml; pool selection ignores location entirely.

Tests

Full suite green (1312 passed locally). New coverage for workspaces, background jobs, lock-wait, and pool search/selection (incl. the SP5-vs-SP7 distinct-slot case). Docs: new "Working on several updates in parallel" section in Documentation/mcp.rst.

codecov · 2026-06-22T15:54:16Z

Codecov Report

❌ Patch coverage is 92.64706% with 30 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.85%. Comparing base (15761b8) to head (9a8dd8a).

Files with missing lines	Patch %	Lines
mtui/mcp/session.py	83.33%	13 Missing ⚠️
mtui/test_reports/testreport.py	89.28%	12 Missing ⚠️
mtui/mcp/tools.py	94.38%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #216      +/-   ##
==========================================
+ Coverage   86.52%   86.85%   +0.32%     
==========================================
  Files         160      161       +1     
  Lines        8980     9369     +389     
==========================================
+ Hits         7770     8137     +367     
- Misses       1210     1232      +22

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

Make the refhost pool usable from a single mtui-mcp client: run several workspaces' host phases in parallel on distinct pool hosts, with no refhosts.yml change (the pool already lists many hosts per arch). The problem: the remote /var/lock/mtui.lock is keyed on user+pid, so two workspaces in one process (same pid) both see a lock as "mine" — the lock cannot keep them off the same host. Pool selection (openSUSE#216) therefore wasn't safe within one client. - host_arbiter.HostArbiter: a process-global, thread-safe map of refhost -> owning workspace with a wait queue. One instance per SessionRegistry, shared by every session it mints; each session is bound to it under its registry key (the owner) via McpSession.bind_arbiter. - TestReport pool selection is now arbiter-aware: a candidate held by another workspace in this process is skipped, and if every candidate is held the claim queues up to [lock] wait seconds for one to be released (acquire_any) — "multiple queues per refhost". Falls back to the prior remote-lock-only path when no arbiter is bound (REPL / single session). - Cross-process / manual visibility uses the existing lock mechanism: a claim takes the remote lock with an identifying comment ("mtui-mcp pool <RRID> [<owner>]"), so other mtui-mcp servers and manual `mtui` users see the host busy and by what; release_pool_claims() removes that remote lock on workspace close (McpSession.close, also fired by the idle sweeper) so hosts do not leak as locked. - Stale mtui-mcp pool locks are reaped by the normal reap_if_stale path (commented locks are not exempt), recovering a crashed server's claims. Tests: HostArbiter (claim/release/release_owner/acquire_any incl. the queue-until-released wait); arbiter-aware selection (skip other-workspace host, none-free, remote-locked-retry); release_pool_claims unlocks remote + drops ownership; reap_if_stale reaps an mtui-mcp pool-commented lock.

…earch_pool/locked_by Compute pool slots in-command via query() instead of Refhosts.search_pool, and use the public Target.is_locked() for --free instead of Target.locked_by (both of which are added by openSUSE#216 and absent on main). Keeps this PR independent.

…ol/locked_by Compute pool slots in-command via query() instead of Refhosts.search_pool, and use the public Target.is_locked() for --free instead of Target.locked_by (both of which are added by #216 and absent on main). Keeps this PR independent.

…one client Every tool now takes an optional `workspace` selector (default "default"). Each distinct name resolves, via the existing SessionRegistry, to its own isolated McpSession: own loaded template, own `targets`, own per-session lock. Because the lock is per-session, calls in different workspaces run concurrently (each blocking body in its own thread), so one stdio client (Claude Code) can advance update B while update A's slow host op runs — instead of load_template tearing A's hosts down to touch B. - registry: `workspace_key`/`split_workspace_key` compose/parse the per-client base key + workspace name; `resolve_session` grows a `workspace` arg (default reproduces today's one-session-per-client); `live_sessions()` snapshot for listing. - main: stdio now uses a SessionRegistry too (idle sweeper disabled — a workspace left quiet while you work another must keep its hosts). The default workspace is minted lazily, so callers that never name one are unaffected. - tools / testreport_tools: surface the `workspace` parameter and thread it through; it is popped before argv encoding so it never leaks to the CLI. - new tools: list_workspaces (this client's workspaces + their loaded template and hosts) and close_workspace (disconnect a workspace's hosts and drop it). Both are scoped to the calling client. - tests: workspace key round-trip, per-workspace and cross-client isolation, default-workspace equivalence, live_sessions snapshot.

Slow host commands (run/update/downgrade/prepare/install/uninstall/ set_repo/reboot) gain a `background=true` flag. Instead of holding the request open for the minutes the op takes, it returns a job id at once and runs the command in an asyncio task that still acquires the session lock for its duration (so it serialises against the workspace's other mutating calls exactly like a foreground call). The client polls and meanwhile drives other workspaces — the practical "don't block the desk on one slow host op". - session: per-session job table + start_job (background runner), job_list, job_status, job_result (returns stdout when done, surfaces the command's failure envelope when failed, tells the caller to poll while running), job_cancel (with the documented mid-SSH detach caveat). - tools: SLOW_COMMANDS gain the `background` parameter; new job_list / job_status / job_result / job_cancel tools, all workspace-scoped. - tests: done / failed / still-running / unknown-id / list / cancel paths.

…nt sharing) When a refhost is locked by another session/agent, TargetLock.lock() can now queue on it — poll until the foreign lock is released (or reaped as stale, or becomes ours) — instead of raising TargetLockedError immediately. This lets several agents share one refhost pool: a host in use is waited for, not errored on. Matters because separate agents are separate processes (distinct pid) so the lock genuinely excludes them (workspaces inside one mtui-mcp process share a pid and never contended). - config: [lock] wait (seconds, default 0 = unchanged fail-fast) and [lock] wait_poll (seconds, default 15). - locks: _wait_for_release polls up to `wait`, logging a warning when it starts waiting (so a REPL user still sees the host is locked and that mtui is now waiting — the connect-time warning is untouched) and on timeout; on timeout the caller raises TargetLockedError as before. _int_cfg reads the options defensively. - tests: fail-fast default, wait-then-succeed (released mid-poll, fake clock), wait-then-timeout. Note: this is the lock-handling half of the refhost-pool work; auto- selecting a *free* candidate per arch from a multi-host pool (so agents pick different hosts rather than queue on one) is the remaining infra step.

…jobs, lock wait) New "Working on several updates in parallel" section covering the workspace argument + list/close_workspace tools, the background=true slow-op flag + job_status/job_result/job_list/job_cancel, and the [lock] wait/wait_poll options, with the same-pid caveat for workspaces inside one process. Synopsis updated to mention named workspaces.

… (parallel agents) Completes the refhost-pool half of the parallelism work (the lock-wait commit handles queueing; this picks the host). When refhosts.yml lists several interchangeable hosts for the same test target and [refhosts] pool_select is on, add_host connects just one *free* host per target instead of the whole matching matrix: it tries candidates in turn, skips any locked by another agent, and claims (locks) the one it takes — so parallel agents drawing from the same pool end up on different hosts. If all candidates are busy it falls back to the first and the [lock] wait policy governs the wait. The selection slot is the full test-target identity the update asks for — product + version + arch + addons — NOT just arch. So an update spanning all arches of e.g. SLE15-SP5 and SP7 still gets a host for every (service-pack, arch) pair; only genuine duplicates (several hosts for the very same target) collapse to one. The slot is keyed on the matched query attribute, so a host carrying an extra addon still pools with a plainer host that satisfies the same target. The pool is searched across all locations (location ignored). - config: [refhosts] pool_select (bool, default false -> unchanged behaviour). - store: search_pool() returns (host, slot) pairs, slot = str(matched attribute); with all_locations=True aggregates across every location, de-duplicated by name (a host binds to the first slot it matches). - testreport: refhosts_from_tp records each candidate's slot under pool mode; connect_targets first runs _claim_pool_candidates, which groups pending candidates by slot and, per multi-candidate slot, connect+claims the first free host (_claim_first_free) and drops the rest. Gates use `is True` so a MagicMock test config can't trip the path. - docs: pool_select + the "location is optional / defaults to default" note. - tests: search_pool all-locations / same-target-one-slot / distinct-SP- distinct-slot / fallback / dedupe; selection first-free / all-busy / lock-race / reaped-stale / reduces-only-within-a-slot.

Fixups for the lint/format/typecheck CI on this branch: - tools.py: register_workspace_tools / register_job_tools now globals().setdefault("Context", Context) — not just to clear F401 but because FastMCP's find_context_parameter runs get_type_hints against the module (with `from __future__ import annotations`), so the closures' string `Context | None` annotation must resolve in module globals for ctx to be injected. Mirrors testreport_tools. - target.py: add Target.try_claim() and Target.locked_by(), encapsulating the pool probe+claim so refhost-pool selection no longer reaches into the private Target._lock (clears SLF001) and drops the bare-except probe. - testreport.py: _claim_first_free uses target.try_claim()/locked_by(). - locks.py: collapse the lock-wait guard into one `if` (SIM102). - ruff format (locks/registry/tools/testreport); test cast() so the unbound TestReport methods accept the duck-typed fake under ty. ruff check/format clean; ty clean for the touched files; tests green.

Make the refhost pool usable from a single mtui-mcp client: run several workspaces' host phases in parallel on distinct pool hosts, with no refhosts.yml change (the pool already lists many hosts per arch). The problem: the remote /var/lock/mtui.lock is keyed on user+pid, so two workspaces in one process (same pid) both see a lock as "mine" — the lock cannot keep them off the same host. Pool selection (openSUSE#216) therefore wasn't safe within one client. - host_arbiter.HostArbiter: a process-global, thread-safe map of refhost -> owning workspace with a wait queue. One instance per SessionRegistry, shared by every session it mints; each session is bound to it under its registry key (the owner) via McpSession.bind_arbiter. - TestReport pool selection is now arbiter-aware: a candidate held by another workspace in this process is skipped, and if every candidate is held the claim queues up to [lock] wait seconds for one to be released (acquire_any) — "multiple queues per refhost". Falls back to the prior remote-lock-only path when no arbiter is bound (REPL / single session). - Cross-process / manual visibility uses the existing lock mechanism: a claim takes the remote lock with an identifying comment ("mtui-mcp pool <RRID> [<owner>]"), so other mtui-mcp servers and manual `mtui` users see the host busy and by what; release_pool_claims() removes that remote lock on workspace close (McpSession.close, also fired by the idle sweeper) so hosts do not leak as locked. - Stale mtui-mcp pool locks are reaped by the normal reap_if_stale path (commented locks are not exempt), recovering a crashed server's claims. Tests: HostArbiter (claim/release/release_owner/acquire_any incl. the queue-until-released wait); arbiter-aware selection (skip other-workspace host, none-free, remote-locked-retry); release_pool_claims unlocks remote + drops ownership; reap_if_stale reaps an mtui-mcp pool-commented lock.

…) + lock visibility/reaping

…laim, pool helpers Raise patch coverage on the parallelism code codecov/patch flagged: - test_mcp_tool_layer: register the workspace/job/testreport tools on a capturing fake server and drive the registered coroutines through a real SessionRegistry (list/close workspace incl. the no-provider-support path, job_list/status/result/cancel lifecycle, and the resolve_session hop in each testreport tool). - test_target_try_claim: Target.try_claim branch matrix (free / locked-by- other / stale-reaped / own-lock / lost-race) + locked_by. - test_pool_helpers_extra: real _pool_lock_comment, _int_cfg fallback, release_pool_claims skip/unlock-error branches, _disconnect_candidate teardown error. Cuts the PR's uncovered new lines ~104 -> ~31; full suite 1476 passing.

plusky force-pushed the feat/mcp-parallelism branch from f667d83 to baf77ff Compare June 22, 2026 16:11

plusky force-pushed the feat/mcp-parallelism branch from 88d54d9 to c8614c3 Compare June 23, 2026 08:47

plusky added 9 commits June 23, 2026 12:33

docs(mcp): pool selection works within one client (in-process arbiter…

09f37a9

…) + lock visibility/reaping

plusky force-pushed the feat/mcp-parallelism branch from b6da297 to 9a8dd8a Compare June 23, 2026 10:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mcp: work on several updates in parallel (workspaces, background jobs, lock wait)#216

mcp: work on several updates in parallel (workspaces, background jobs, lock wait)#216
plusky wants to merge 9 commits into
openSUSE:mainfrom
plusky:feat/mcp-parallelism

plusky commented Jun 22, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

plusky commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Named workspaces

2. Async background jobs

3. Wait for a busy refhost

4. Refhost pool selection

Notes

Tests

Uh oh!

codecov Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

plusky commented Jun 22, 2026 •

edited

Loading

codecov Bot commented Jun 22, 2026 •

edited

Loading