mcp: work on several updates in parallel (workspaces, background jobs, lock wait)#216
Open
plusky wants to merge 9 commits into
Open
mcp: work on several updates in parallel (workspaces, background jobs, lock wait)#216plusky wants to merge 9 commits into
plusky wants to merge 9 commits into
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #216 +/- ##
==========================================
+ Coverage 86.52% 86.85% +0.32%
==========================================
Files 160 161 +1
Lines 8980 9369 +389
==========================================
+ Hits 7770 8137 +367
- Misses 1210 1232 +22 ☔ View full report in Codecov by Harness. |
f667d83 to
baf77ff
Compare
plusky
added a commit
to plusky/mtui
that referenced
this pull request
Jun 22, 2026
Make the refhost pool usable from a single mtui-mcp client: run several workspaces' host phases in parallel on distinct pool hosts, with no refhosts.yml change (the pool already lists many hosts per arch). The problem: the remote /var/lock/mtui.lock is keyed on user+pid, so two workspaces in one process (same pid) both see a lock as "mine" — the lock cannot keep them off the same host. Pool selection (openSUSE#216) therefore wasn't safe within one client. - host_arbiter.HostArbiter: a process-global, thread-safe map of refhost -> owning workspace with a wait queue. One instance per SessionRegistry, shared by every session it mints; each session is bound to it under its registry key (the owner) via McpSession.bind_arbiter. - TestReport pool selection is now arbiter-aware: a candidate held by another workspace in this process is skipped, and if every candidate is held the claim queues up to [lock] wait seconds for one to be released (acquire_any) — "multiple queues per refhost". Falls back to the prior remote-lock-only path when no arbiter is bound (REPL / single session). - Cross-process / manual visibility uses the existing lock mechanism: a claim takes the remote lock with an identifying comment ("mtui-mcp pool <RRID> [<owner>]"), so other mtui-mcp servers and manual `mtui` users see the host busy and by what; release_pool_claims() removes that remote lock on workspace close (McpSession.close, also fired by the idle sweeper) so hosts do not leak as locked. - Stale mtui-mcp pool locks are reaped by the normal reap_if_stale path (commented locks are not exempt), recovering a crashed server's claims. Tests: HostArbiter (claim/release/release_owner/acquire_any incl. the queue-until-released wait); arbiter-aware selection (skip other-workspace host, none-free, remote-locked-retry); release_pool_claims unlocks remote + drops ownership; reap_if_stale reaps an mtui-mcp pool-commented lock.
plusky
added a commit
to plusky/mtui
that referenced
this pull request
Jun 23, 2026
Make the refhost pool usable from a single mtui-mcp client: run several workspaces' host phases in parallel on distinct pool hosts, with no refhosts.yml change (the pool already lists many hosts per arch). The problem: the remote /var/lock/mtui.lock is keyed on user+pid, so two workspaces in one process (same pid) both see a lock as "mine" — the lock cannot keep them off the same host. Pool selection (openSUSE#216) therefore wasn't safe within one client. - host_arbiter.HostArbiter: a process-global, thread-safe map of refhost -> owning workspace with a wait queue. One instance per SessionRegistry, shared by every session it mints; each session is bound to it under its registry key (the owner) via McpSession.bind_arbiter. - TestReport pool selection is now arbiter-aware: a candidate held by another workspace in this process is skipped, and if every candidate is held the claim queues up to [lock] wait seconds for one to be released (acquire_any) — "multiple queues per refhost". Falls back to the prior remote-lock-only path when no arbiter is bound (REPL / single session). - Cross-process / manual visibility uses the existing lock mechanism: a claim takes the remote lock with an identifying comment ("mtui-mcp pool <RRID> [<owner>]"), so other mtui-mcp servers and manual `mtui` users see the host busy and by what; release_pool_claims() removes that remote lock on workspace close (McpSession.close, also fired by the idle sweeper) so hosts do not leak as locked. - Stale mtui-mcp pool locks are reaped by the normal reap_if_stale path (commented locks are not exempt), recovering a crashed server's claims. Tests: HostArbiter (claim/release/release_owner/acquire_any incl. the queue-until-released wait); arbiter-aware selection (skip other-workspace host, none-free, remote-locked-retry); release_pool_claims unlocks remote + drops ownership; reap_if_stale reaps an mtui-mcp pool-commented lock.
88d54d9 to
c8614c3
Compare
plusky
added a commit
to plusky/mtui
that referenced
this pull request
Jun 23, 2026
…earch_pool/locked_by Compute pool slots in-command via query() instead of Refhosts.search_pool, and use the public Target.is_locked() for --free instead of Target.locked_by (both of which are added by openSUSE#216 and absent on main). Keeps this PR independent.
mimi1vx
pushed a commit
to plusky/mtui
that referenced
this pull request
Jun 23, 2026
…earch_pool/locked_by Compute pool slots in-command via query() instead of Refhosts.search_pool, and use the public Target.is_locked() for --free instead of Target.locked_by (both of which are added by openSUSE#216 and absent on main). Keeps this PR independent.
mimi1vx
pushed a commit
that referenced
this pull request
Jun 23, 2026
…ol/locked_by Compute pool slots in-command via query() instead of Refhosts.search_pool, and use the public Target.is_locked() for --free instead of Target.locked_by (both of which are added by #216 and absent on main). Keeps this PR independent.
…one client Every tool now takes an optional `workspace` selector (default "default"). Each distinct name resolves, via the existing SessionRegistry, to its own isolated McpSession: own loaded template, own `targets`, own per-session lock. Because the lock is per-session, calls in different workspaces run concurrently (each blocking body in its own thread), so one stdio client (Claude Code) can advance update B while update A's slow host op runs — instead of load_template tearing A's hosts down to touch B. - registry: `workspace_key`/`split_workspace_key` compose/parse the per-client base key + workspace name; `resolve_session` grows a `workspace` arg (default reproduces today's one-session-per-client); `live_sessions()` snapshot for listing. - main: stdio now uses a SessionRegistry too (idle sweeper disabled — a workspace left quiet while you work another must keep its hosts). The default workspace is minted lazily, so callers that never name one are unaffected. - tools / testreport_tools: surface the `workspace` parameter and thread it through; it is popped before argv encoding so it never leaks to the CLI. - new tools: list_workspaces (this client's workspaces + their loaded template and hosts) and close_workspace (disconnect a workspace's hosts and drop it). Both are scoped to the calling client. - tests: workspace key round-trip, per-workspace and cross-client isolation, default-workspace equivalence, live_sessions snapshot.
Slow host commands (run/update/downgrade/prepare/install/uninstall/ set_repo/reboot) gain a `background=true` flag. Instead of holding the request open for the minutes the op takes, it returns a job id at once and runs the command in an asyncio task that still acquires the session lock for its duration (so it serialises against the workspace's other mutating calls exactly like a foreground call). The client polls and meanwhile drives other workspaces — the practical "don't block the desk on one slow host op". - session: per-session job table + start_job (background runner), job_list, job_status, job_result (returns stdout when done, surfaces the command's failure envelope when failed, tells the caller to poll while running), job_cancel (with the documented mid-SSH detach caveat). - tools: SLOW_COMMANDS gain the `background` parameter; new job_list / job_status / job_result / job_cancel tools, all workspace-scoped. - tests: done / failed / still-running / unknown-id / list / cancel paths.
…nt sharing) When a refhost is locked by another session/agent, TargetLock.lock() can now queue on it — poll until the foreign lock is released (or reaped as stale, or becomes ours) — instead of raising TargetLockedError immediately. This lets several agents share one refhost pool: a host in use is waited for, not errored on. Matters because separate agents are separate processes (distinct pid) so the lock genuinely excludes them (workspaces inside one mtui-mcp process share a pid and never contended). - config: [lock] wait (seconds, default 0 = unchanged fail-fast) and [lock] wait_poll (seconds, default 15). - locks: _wait_for_release polls up to `wait`, logging a warning when it starts waiting (so a REPL user still sees the host is locked and that mtui is now waiting — the connect-time warning is untouched) and on timeout; on timeout the caller raises TargetLockedError as before. _int_cfg reads the options defensively. - tests: fail-fast default, wait-then-succeed (released mid-poll, fake clock), wait-then-timeout. Note: this is the lock-handling half of the refhost-pool work; auto- selecting a *free* candidate per arch from a multi-host pool (so agents pick different hosts rather than queue on one) is the remaining infra step.
…jobs, lock wait) New "Working on several updates in parallel" section covering the workspace argument + list/close_workspace tools, the background=true slow-op flag + job_status/job_result/job_list/job_cancel, and the [lock] wait/wait_poll options, with the same-pid caveat for workspaces inside one process. Synopsis updated to mention named workspaces.
… (parallel agents) Completes the refhost-pool half of the parallelism work (the lock-wait commit handles queueing; this picks the host). When refhosts.yml lists several interchangeable hosts for the same test target and [refhosts] pool_select is on, add_host connects just one *free* host per target instead of the whole matching matrix: it tries candidates in turn, skips any locked by another agent, and claims (locks) the one it takes — so parallel agents drawing from the same pool end up on different hosts. If all candidates are busy it falls back to the first and the [lock] wait policy governs the wait. The selection slot is the full test-target identity the update asks for — product + version + arch + addons — NOT just arch. So an update spanning all arches of e.g. SLE15-SP5 and SP7 still gets a host for every (service-pack, arch) pair; only genuine duplicates (several hosts for the very same target) collapse to one. The slot is keyed on the matched query attribute, so a host carrying an extra addon still pools with a plainer host that satisfies the same target. The pool is searched across all locations (location ignored). - config: [refhosts] pool_select (bool, default false -> unchanged behaviour). - store: search_pool() returns (host, slot) pairs, slot = str(matched attribute); with all_locations=True aggregates across every location, de-duplicated by name (a host binds to the first slot it matches). - testreport: refhosts_from_tp records each candidate's slot under pool mode; connect_targets first runs _claim_pool_candidates, which groups pending candidates by slot and, per multi-candidate slot, connect+claims the first free host (_claim_first_free) and drops the rest. Gates use `is True` so a MagicMock test config can't trip the path. - docs: pool_select + the "location is optional / defaults to default" note. - tests: search_pool all-locations / same-target-one-slot / distinct-SP- distinct-slot / fallback / dedupe; selection first-free / all-busy / lock-race / reaped-stale / reduces-only-within-a-slot.
Fixups for the lint/format/typecheck CI on this branch:
- tools.py: register_workspace_tools / register_job_tools now
globals().setdefault("Context", Context) — not just to clear F401 but
because FastMCP's find_context_parameter runs get_type_hints against the
module (with `from __future__ import annotations`), so the closures' string
`Context | None` annotation must resolve in module globals for ctx to be
injected. Mirrors testreport_tools.
- target.py: add Target.try_claim() and Target.locked_by(), encapsulating the
pool probe+claim so refhost-pool selection no longer reaches into the
private Target._lock (clears SLF001) and drops the bare-except probe.
- testreport.py: _claim_first_free uses target.try_claim()/locked_by().
- locks.py: collapse the lock-wait guard into one `if` (SIM102).
- ruff format (locks/registry/tools/testreport); test cast() so the unbound
TestReport methods accept the duck-typed fake under ty.
ruff check/format clean; ty clean for the touched files; tests green.
Make the refhost pool usable from a single mtui-mcp client: run several workspaces' host phases in parallel on distinct pool hosts, with no refhosts.yml change (the pool already lists many hosts per arch). The problem: the remote /var/lock/mtui.lock is keyed on user+pid, so two workspaces in one process (same pid) both see a lock as "mine" — the lock cannot keep them off the same host. Pool selection (openSUSE#216) therefore wasn't safe within one client. - host_arbiter.HostArbiter: a process-global, thread-safe map of refhost -> owning workspace with a wait queue. One instance per SessionRegistry, shared by every session it mints; each session is bound to it under its registry key (the owner) via McpSession.bind_arbiter. - TestReport pool selection is now arbiter-aware: a candidate held by another workspace in this process is skipped, and if every candidate is held the claim queues up to [lock] wait seconds for one to be released (acquire_any) — "multiple queues per refhost". Falls back to the prior remote-lock-only path when no arbiter is bound (REPL / single session). - Cross-process / manual visibility uses the existing lock mechanism: a claim takes the remote lock with an identifying comment ("mtui-mcp pool <RRID> [<owner>]"), so other mtui-mcp servers and manual `mtui` users see the host busy and by what; release_pool_claims() removes that remote lock on workspace close (McpSession.close, also fired by the idle sweeper) so hosts do not leak as locked. - Stale mtui-mcp pool locks are reaped by the normal reap_if_stale path (commented locks are not exempt), recovering a crashed server's claims. Tests: HostArbiter (claim/release/release_owner/acquire_any incl. the queue-until-released wait); arbiter-aware selection (skip other-workspace host, none-free, remote-locked-retry); release_pool_claims unlocks remote + drops ownership; reap_if_stale reaps an mtui-mcp pool-commented lock.
…) + lock visibility/reaping
…laim, pool helpers Raise patch coverage on the parallelism code codecov/patch flagged: - test_mcp_tool_layer: register the workspace/job/testreport tools on a capturing fake server and drive the registered coroutines through a real SessionRegistry (list/close workspace incl. the no-provider-support path, job_list/status/result/cancel lifecycle, and the resolve_session hop in each testreport tool). - test_target_try_claim: Target.try_claim branch matrix (free / locked-by- other / stale-reaped / own-lock / lost-race) + locked_by. - test_pool_helpers_extra: real _pool_lock_comment, _int_cfg fallback, release_pool_claims skip/unlock-error branches, _disconnect_candidate teardown error. Cuts the PR's uncovered new lines ~104 -> ~31; full suite 1476 passing.
b6da297 to
9a8dd8a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Lets one client — or several agents — keep more than one update moving at once, without each
load_templatetearing the previous update's hosts down. Five commits.1. Named workspaces
Every tool gains an optional
workspaceselector (default"default"). Each name resolves, via the existingSessionRegistry, to its own isolatedMcpSession(own loaded template,targets, per-session lock). Calls in different workspaces run concurrently, so a single stdio client can advance update B while update A's slow host op runs. stdio now uses the registry too (idle sweeper disabled). New tools:list_workspaces,close_workspace. Thedefaultworkspace reproduces today's behaviour.2. Async background jobs
Slow host commands (
run,update,downgrade,prepare,install,uninstall,set_repo,reboot) takebackground=true: returns a job id immediately instead of holding the request open, running under the workspace lock. Poll withjob_status, fetch withjob_result(job_list/job_canceltoo).3. Wait for a busy refhost
Separate agents are separate processes, so a refhost lock genuinely excludes them.
[lock] wait(seconds, default0= fail-fast) makeslockqueue on a busy host — polling every[lock] wait_pollseconds until released/reaped/ours — instead of erroring. The connect-time warning is untouched; a warning is logged on wait-start/timeout so a REPL user still sees the host is busy.4. Refhost pool selection
When
refhosts.ymllists several interchangeable hosts for the same test target and[refhosts] pool_selectis on,add_hostconnects just one free host per target instead of the whole matrix: tries candidates in turn, skips any locked by another agent, and claims (locks) the one it takes — so parallel agents draw distinct hosts. The target is the fullproduct + version + arch + addonsthe update asks for, not just arch — so an update spanning all arches of SLE15-SP5 and SP7 still gets a host per (service-pack, arch); only genuine duplicates collapse to one. Searched across all locations (location ignored). Off by default.Together: a pool gives several agents distinct hosts;
lock waitmakes them queue when the pool is exhausted; workspaces + background jobs keep several updates moving per agent/client.Notes
[lock] waitand pool selection matter across separate processes/agents.locationis optional: with none configured/specified it defaults to thedefaultbucket ofrefhosts.yml; pool selection ignores location entirely.Tests
Full suite green (1312 passed locally). New coverage for workspaces, background jobs, lock-wait, and pool search/selection (incl. the SP5-vs-SP7 distinct-slot case). Docs: new "Working on several updates in parallel" section in
Documentation/mcp.rst.