Backlog

Tracks implementation work against DESIGN.md. Mark items [x] as they land. Phases mirror DESIGN.md §10.

Progress

2026-06-01

ssh_server_info dual-surface (v1.5.0) -- New read-tier tool plus an mcp://ssh-mcp/server-info MCP resource that share a single _collect_server_info helper. Returns {name, version, total_tools, enabled_tiers, enabled_groups} -- the LLM (and operators) can self-introspect "what version am I talking to / which tiers + groups does the operator have unlocked / how many tools are visible". Resource is the primary discovery path (free per turn, doesn't cost catalog tokens); tool is the fallback for clients that don't surface resources to the LLM (Claude Code, Claude Desktop). Same JSON payload shape on both surfaces. Lives in group:host. No new env vars. Tests: 1649 passing / 3 skipped (+6: 5 server-info + 1 SKILL ascii). ADR-0029. (id: server-info-dual-surface-v1.5.0)
- Open subdocs: none -- AGENTS/SECURITY/CONFIGURATION untouched (no security boundary shifted, no operator-facing knob added).

2026-05-30

CAS concurrent-writer safety for ssh_host_notes_append (v1.4.0, hot-patch) -- INC-065 fix. Operator hit a real lost-update where two MCP server processes both appended to the same notes/<host>.md and one entry vanished. The read-modify-write was atomic at the FS level (tmp+os.replace) but had no logical CAS. Fix: optimistic CAS via (mtime_ns, size) snapshot captured at read, re-checked right before os.replace; 5-iteration retry loop in ssh_host_notes_append that rebuilds against fresh snapshots when a concurrent writer beats us. Pathological contention raises RuntimeError instead of unbounded spin. ssh_host_notes_set deliberately stays last-writer-wins (whole-file replace; CAS variant with explicit expected_etag deferred). Microseconds-wide TOCTOU window between final stat() and os.replace is documented and accepted for our contention level; fcntl.flock rejected by operator preference. Tests: 1643 passing / 3 skipped (+3 CAS-specific). INC-065 recorded. (id: notes-append-cas-v1.4.0)
- BACKLOG candidate for v1.14+: ssh_host_notes_set(expected_etag=...) opt-in CAS -- caller passes the snapshot etag they read from ssh_host_notes, write rejects on mismatch with a clear "file changed since you read it, re-read first" error.
Sudo-tier path-bearing tools + path-aware cheatsheet (v1.4.0) -- Five new sudo-tier tools: ssh_sudo_read, ssh_sudo_read_redacted, ssh_sudo_write, ssh_sudo_edit, ssh_sudo_sftp_list. All tagged {dangerous, sudo, group:sudo}, routed through full resolve_path policy chain. New service services/sudo_file_ops.py with helpers sudo_read_bytes, sudo_stat_owner, sudo_stat_mode, sudo_atomic_write, sudo_ls_parsed. ssh_sudo_write supports three-way payload mutex (content_text/content_base64/local_path); local_path reads from MCP host into memory then pipes via stdin, capped at SSH_LOCAL_TRANSFER_MAX_BYTES (2 GiB) vs 256 MiB for inline. ssh_sudo_edit preserves both ownership AND mode (sudo stat -c '%a' + sudo stat -c '%U:%G' pre-step; 0o600 secrets files do not get widened to 0o644). Path-aware cheatsheet expansion: 7 new patterns (read-single, read-ambiguous, list-single, sudo-read-single, sudo-write-single, sudo-edit-single, sudo-list-single); read-single / sudo-read-single routes to _redacted variant when path matches redact_paths_globs. Live-verified on iruelg4 (sudo cat .env -> RedactBypassBlocked, ssh_sudo_write mode-0o600 round-trip, ssh_sudo_edit mode-preservation). Phase-3 bug fixed: sudo_atomic_write positional-arg parse failure (sh: Syntax error: word unexpected) -- shell vars inlined at top of script body instead. Tests: 1640 passing / 3 skipped. ADR-0028 recorded. INC-064 updated with partial mitigation. (id: sudo-path-tools-v1.4.0)
- BACKLOG candidates for v1.14+:
  - ssh_sudo_write(local_path=) true streaming (currently in-memory buffer; no chunked pipe to sudo stdin yet)
  - fetch_sudo_password per-call caching: 4 keyring/subprocess lookups per ssh_sudo_edit is operator-visible latency when SSH_SUDO_PASSWORD_CMD is slow
  - Project convention: shell-script-body construction tests should dry-run via subprocess.run(["sh", "-n", "-c", body]) to catch parse errors before live-verify (the Phase-3 positional-arg bug would have been caught in milliseconds)
Secret-redaction policy (v1.4.0) — Full redact layer shipped and live-verified on a real host. New ssh_read_redacted tool (read-tier, group:sftp-read): auto-detects format from extension (env/yaml/json/ini/generic), redacts via 3-layer detection (key-match / PEM-always / entropy), HMAC-SHA256 12-char prefix markers, deterministic across calls. redact_bypass_policy gates raw-content SFTP tools on redact_paths_globs-listed paths (block/warn/audit_only). restricted_globs adds glob-pattern hard-deny alongside the existing prefix restricted_paths. Per-host overrides for all 7 knobs in hosts.toml [defaults] block. hosts.toml.example + .env.example updated with full reference. Audit-line redact_bypass=true field lands via ContextVar side-effect in services/audit.py for warn/audit_only modes. Raw-exec bypass gap documented as INC-064 (by-design; mitigated by not allowlisting cat/less in command_allowlist). Anchor syntax (^PREFIX_, _SUFFIX$, ^EXACT$) in _key_matches prevents over-matching (e.g. ^PASS_, _PASS$ instead of bare PASS). 16 pre-existing mypy errors across ssh/, services/edit_service.py, telemetry.py etc. fixed incidentally in this sprint. Tests: 5 new test modules, 1577 passing / 3 skipped. ADR-0027 recorded. (id: redact-policy-v1.4.0)

2026-05-28

local_path mode for ssh_upload, ssh_deploy, ssh_sftp_download (v1.3.0) — Adds a local_path keyword-only param to all three tools, streaming bytes between the MCP host's own filesystem and the remote target without encoding them into tool-call arguments. New SSH_LOCAL_TRANSFER_ROOTS env var (CSV/JSON allowlist of MCP-host directories; empty = disabled, default) and SSH_LOCAL_TRANSFER_MAX_BYTES (default 2 GiB, separate from the existing 256 MiB base64 cap). New services/local_path_policy.py enforces the MCP-host-side boundary; LocalPathPolicyError added to ssh/errors.py. WriteResult and DownloadResult gain local_path_written: str | None. ssh_sftp_download in local_path mode streams remote to local via atomic tmp+os.replace. Fixes the base64-channel bottleneck where the LLM defensively chunked large uploads into many small tool calls. Documented in ADR-0026; skills updated for ssh_upload, ssh_deploy, ssh_sftp_download; runbooks ssh-deploy-verify and ssh-host-snapshot updated with local_path guidance. (id: local-path-transfer)

2026-05-22

Exec-discipline sprint kickoff. Eval of a real OS-upgrade session found ~62% of 127 ssh_exec_run calls were avoidable (matched an existing native tool's cheatsheet entry), with 3 anti-patterns dominating: heredoc file-writes that should go through ssh_upload, lookups that should use native sftp/systemctl/docker tools, and ad-hoc shell composites where the script itself wasn't the artefact. Full breakdown in docs/evals/2026-05-22-exec-run-discipline.md. Kicks off the discipline sprint: A1 (this row + correction #8 in .claude/team/corrections.md + eval relocation), C1/C2 (first-class systemctl/apt mutation tools so ssh_sudo_exec systemctl ... and ssh_sudo_exec apt-get ... stop being the default path), B1 (default-on SSH_EXEC_ALLOW_CHEATSHEET_PATTERNS=false reject patterns so heredoc + native-tool-matching commands fail closed at the tool surface), B2 (hint footer pointing the LLM at the matching native tool when a reject fires), D1/D2 (runbook addition + AGENTS.md sweep so the discipline is documented and audit-checkable). (id: exec-discipline-sprint)

2026-05-08

Native --filter kwargs on read-tier docker list tools (v1.8.0) — ssh_docker_ps gains name, status, label, ancestor; ssh_docker_images gains reference (glob-style, supports */?/digests), dangling, label; ssh_docker_compose_ps gains service (trailing positional) and status (7-value compose set including removing). Label key regex widened to [A-Za-z0-9._/-]{1,128} so k8s-style keys (app.kubernetes.io/name) are accepted. All filters validated before I/O via _validate_name, _validate_label_filter, _validate_reference_filter. Argv ordering deterministic and locked by tests in tests/test_docker_read_filters.py. (id: docker-filter-kwargs-sprint)
Follow-up: defense-in-depth regex tightening — _DOCKER_NAME_RE, _DOCKER_FILTER_RE, _DOCKER_TIME_RE in _helpers.py should be tightened to re.fullmatch instead of re.match to prevent prefix-match bypass. Deferred from the filter sprint; flagged by senior-reviewer.
Follow-up: pre-existing format baseline cleanup — one-shot uv run ruff format sprint on __init__.py, lifecycle_tools.py, dangerous_tools.py, test_docker_top_cp.py. Deferred; formatting noise in those files was pre-existing and out of scope for the filter sprint.
Follow-up: mypy baseline cleanup — 9 pre-existing mypy errors across 8 modules outside tools/docker/ (not introduced by the filter sprint). Deferred; needs a dedicated pass to triage and fix without touching unrelated code.
2026-04-30 (latest) Sprint 3 — Consistency cleanup (v1.4.0). Three independent hardening tasks shipped together. (3a) as_str helper consolidation: new services/text.py exposes as_str(value: bytes | bytearray | str | None) -> str (errors-replace UTF-8 decode; None → ""). Replaces two private _as_str helpers (in ssh/exec.py + tools/sftp_read_tools.py) and 12+ inline coercion sites across the codebase. Tight signature — no object accepted. (3b) extra="forbid" on systemctl models: models/systemctl.py was the only model file missing INC-046's strictness. Added _RESULT_MODEL_CONFIG = ConfigDict(extra="forbid") constant + applied to all 9 models (mirrors models/results.py pattern; closes ADR-0025 in DECISIONS.md). (3c) @audited(tier="read") on every read tool (Option B): 16 read tools decorated — SFTP (ssh_sftp_list, ssh_sftp_stat, ssh_sftp_download, ssh_find), sessions (ssh_session_list, ssh_shell_list), host (ssh_host_ping, ssh_host_info, ssh_host_network, ssh_user_info, ssh_host_disk_usage, ssh_host_processes, ssh_host_alerts, ssh_known_hosts_verify, ssh_host_list, ssh_host_notes). New test tests/test_audited_coverage.py — 24 parametrized tests locking the policy in (every read tool MUST carry @audited(tier="read") at module load). (id: consistency-cleanup-sprint3)
2026-04-27 (latest) resolve_path helper added to path-policy chain — closes the "forgot check_not_restricted" footgun class. New async def resolve_path(conn, path, policy, settings, *, must_exist=True) -> str in services/path_policy.py bundles canonicalize_and_check + check_not_restricted into one call so tool authors can't accidentally skip the restricted-zones check. ssh_transfer and ssh_upload migrated to the new helper; ssh_link's two-sided validation and compose-file call sites deliberately kept on the underlying primitives. Docs updated: DESIGN.md §5.6, TOOLS.md low-access intro + ssh_transfer row, BOOTSTRAP.md §5 Path safety + footgun worked example + security checklist, skills/ssh-transfer/SKILL.md, skills/ssh-upload/SKILL.md. Pure service-layer refactor; no tool behavior changes; no version bump (covered by 1.1.0). Suite: 826 unit pass (unchanged), 1 skipped. (id: resolve-path-refactor)
2026-04-27 ssh_host_ping also auto-injects agent notes — both layers ride on ping (INC-060). Operator: "ssh ping should have option to show all notes too (on by default)." INC-059 had explicitly held the agent layer back from auto-injection on context-budget grounds; operator overruled -- visibility into past-self memory beats the budget concern, especially since most agent sidecars are far smaller than the 256 KiB cap. Added SSH_PING_INCLUDES_AGENT_NOTES: bool = True setting (parallel structure to SSH_PING_INCLUDES_NOTES, independent toggle so operators can mix-and-match) and PingResult.agent_notes: str | None = None field. The injection logic in ssh_host_ping reuses the existing _HOST_NOTES_ALIAS_RE + _read_sidecar helpers from INC-055 (defense-in-depth: even though resolve_host already filters aliases, the regex re-validates before path concatenation). 0-byte sidecars return None (matches ssh_host_notes semantics for cleared-via-_set("") files). 6 new regression tests on top of INC-059's 7 (13 total in tests/test_ping_notes_injection.py): sidecar exists + setting on → populated; sidecar missing → None; setting off → None; SSH_HOST_NOTES_DIR=None → None; 0-byte sidecar → None; independence test exercising all four (operator x agent on/off) combinations. SKILL updates: ssh-host-ping/SKILL.md shows both fields + documents the 256 KiB context-budget caveat + opt-out; ssh-host-notes/SKILL.md "When to call it" rewritten to reframe the dedicated tool as primarily for "re-reads after writes" / "ping injection disabled" cases (the standard discovery flow now puts ping first). .env.example documents the new setting; TOOLS.md ping row updated with both layers. Catalog: still 74 tools (field addition). Suite: 826 unit pass (up from 820), 1 skipped. Ruff: one TC003 stdlib-Path import in test fixture flagged + fixed (moved to TYPE_CHECKING -- only used for annotations). Mypy strict: zero new errors.
2026-04-27 ssh_host_ping auto-injects operator notes — enforcement-by-ergonomics (INC-059). Operator asked: did we make sure the LLM loads host memory first when connecting to a host? Honest audit: no -- INC-055's two-layer notes were only DOCUMENTED as "always call before doing anything substantive" via SKILL files (load on demand) and a has_notes: bool flag on ssh_host_list (LLM had to think to check). Nothing made the LLM actually CALL ssh_host_notes(host) first; if it skipped the SKILL load and reached for ssh_exec_run straight off, the operator's hard-rule constraints never entered context. Fix: auto-inject the operator-baseline notes into ssh_host_ping's result. Ping is the canonical "I'm starting work on this host" probe; LLMs reach for it naturally early in any host-targeted workflow. Riding the notes on ping means the LLM gets the operator's constraints into context for free. Agent-side memory (the LLM's own session-spanning sidecar) is NOT auto-injected -- it can grow to 256 KiB and would bloat every ping; still requires the dedicated ssh_host_notes call. Three options weighed: (1) auto-inject into ping [shipped], (2) auto-inject into EVERY host-acting tool result [unmissable but invasive across ~15 result models], (3) pre-tool-call hook that fails the first call to a host with has_notes=True until ssh_host_notes(host) was called [most authoritarian; LLM would just learn to call notes blindly]. Option 1 picked for surface-area-to-value ratio. New setting SSH_PING_INCLUDES_NOTES: bool = True (config.py) is the opt-out for tool-execution-only deployments where ping should stay minimal. New field PingResult.operator_notes: str | None = None (models/results.py) populated only when setting is true AND the host has notes set; whitespace-only notes treated as absent (matches has_notes logic). 7 regression tests in tests/test_ping_notes_injection.py covering: notes present + default on injects; setting off omits; no notes returns None; whitespace-only treated as absent; surrounding whitespace stripped; setting off with no notes still None; existing ping fields unaffected. SKILL updates: ssh-host-ping/SKILL.md Returns example shows operator_notes + "When to call it" emphasizes "read them before proposing a plan -- they may forbid the obvious approach you were about to take"; ssh-host-notes/SKILL.md rewritten to describe the operator layer as "auto-injected into ping" and reframe the dedicated tool as primarily for the AGENT layer. TOOLS.md ping row + .env.example updated. Catalog: still 74 tools (field addition, not a new tool). Suite: 820 unit pass (up from 813), 1 skipped. Ruff: one TC003 stdlib-Path import flagged + fixed (moved to TYPE_CHECKING -- safe because from __future__ import annotations makes annotations strings; helpers using Path are not tools so FastMCP's get_type_hints() doesn't see them). Mypy strict: zero new errors.
2026-04-25 Output sanitizer (INC-057) + Pass A extension to file-content surfaces (INC-058). Operator asked what happens when remote tools return "poisoned" output. Audit showed the encoding layer was safe (UTF-8 with errors='replace', JSON escaping, byte-cap) but the content itself was unfiltered -- prompt-injection / display-hijack surface. INC-057 (sanitizer core): new services/output_sanitizer.py with sanitize(text) -> (cleaned, warnings) -- strips ANSI escape sequences (CSI / OSC with BEL or ST terminators / single-byte private-use) and NUL bytes; flags-only on bidi overrides (U+202D/E, U+2066-U+2069 -- the trojan-source attack), zero-width chars (U+200B-U+200D, U+FEFF), C1 controls (U+0080-U+009F), LLM protocol markers (<|im_end|>, </s>, [INST], etc., 16 patterns, case-insensitive), and lines mimicking conversation turns (^User: / ^Assistant: / ^System: / ^Human: / ^AI:, line-start anchored, case-insensitive). Wired into ssh/exec.py run() + run_streaming() after truncation; new output_warnings: list[str] = [] field on ExecResult carries the result. Warnings from stdout + stderr merge into a deduplicated list. Streaming chunk_cb deliberately sees raw bytes (progress is ephemeral); the persisted ExecResult.stdout is always sanitized. Coverage extends transitively to every tool that goes through exec.run() -- ssh_exec_*, ssh_sudo_*, ssh_shell_exec, ssh_broadcast, all 22 ssh_docker_* (via _run_docker.model_dump()). 40 regression tests in tests/test_output_sanitizer.py. INC-058 Pass A (extension to non-exec file-content paths): _run_systemctl widened from 3-tuple to 4-tuple (stdout, stderr, exit_code, output_warnings) -- 8 callers updated, 3 propagate (ssh_systemctl_status / _cat / ssh_journalctl), 5 discard (is_active / is_enabled / is_failed / list_units / show). Their result models gained output_warnings: list[str] = []. ssh_sftp_download separately runs the new scan(text) flag-only helper on a UTF-8 view of the bytes -- content_base64 is NOT modified (binary safety) but warnings flag what a text decode would surface so the LLM can sanitize() after decoding if processing as text. DownloadResult gained output_warnings. _run_docker already returned result.model_dump() with INC-057's warnings included, so docker tools (ssh_docker_logs, _inspect, etc.) propagate warnings without code changes. 8 propagation tests in tests/test_output_warnings_propagation.py. The trojan-source meta-loop: every non-ASCII codepoint in the sanitizer + its tests is written as chr(0xNNNN) rather than as literal characters -- IDE bidi-aware lints correctly flagged the literal form as the exact "obfuscated source code" pattern the sanitizer exists to defend against. Module docstring + comments capture this so future readers don't "clean up" the chr() calls. Pass B (filename scanning for ssh_sftp_list / ssh_find + structured docker / host fields) deferred -- lower volume than the file/log content paths. Catalog: still 74 tools (model + middleware change, no new tools). Suite: 813 unit pass (up from 765), 1 skipped. Ruff clean on touched files; mypy strict gains zero new errors and clears 2 pre-existing ones in sftp_read_tools.py via defensive bytes coercion on the SFTP read return.
2026-04-25 ssh_link expanded — hard + symbolic links with both-sides path validation (INC-056 cont.). Added symbolic: bool = False parameter to the existing ssh_link (low-access + group:file-ops). When symbolic=True, calls sftp.symlink(src, dst) directly — pure SFTP, src stored verbatim (preserves relative-link semantics). Per GNU ln's "Using -s ignores -L and -P", follow_symlinks is silently ignored in symbolic mode. Both sides path-validated per operator direction: dst goes through normal canonicalize_and_check; src is treated as a TARGET STRING (not a real path -- POSIX permits dangling symlinks, target may not exist) and validated string-wise via reject_bad_characters + relative-resolve against dst's parent + posixpath.normpath + check_in_allowlist + check_not_restricted. Original src text passed verbatim to sftp.symlink(); the policy decision was made on the normalized form, but on-disk semantics keep operator intent. Defense-in-depth rationale: read-through-symlink already re-triggers path policy via canonicalize, but write-time validation also catches the prompt-injection pattern where a malicious prompt creates link -> /etc/shadow as a marker. Dangling targets remain ALLOWED (POSIX-correct). NUL / control bytes in target rejected up front. SKILL.md rewritten with three-mode coverage + the path-policy notes split per mode + new examples for the current → release-vN rolling-release pattern + dangling-target case. TOOLS.md row expanded. INC-056 detail updated. 7 new test cases (14 total in tests/test_link.py) covering: sftp.symlink called with verbatim target, dangling targets succeed (no lstat call), target outside allowlist raises PathNotAllowed, relative target resolved against dst's parent for the policy check, NUL bytes rejected, follow_symlinks silently ignored when symbolic=True. Catalog still 74 tools (parameter expansion of an existing tool, not a new one). Suite: 765 unit pass (up from 758), 1 skipped. Ruff clean; mypy strict adds zero new errors. The "flip default to match GNU ln's -P default" question (raised after the operator quoted the GNU man page) was deferred — current default remains follow_symlinks=True matching OpenSSH SFTP's natural behavior; easy to flip later if real-world surprise materializes.
2026-04-25 ssh_link -- hard links, default -L, opt-in -P (INC-056). New low-access + group:file-ops tool: ssh_link(host, src, dst, ctx, follow_symlinks=True). Default mode calls sftp.link() directly -- pure SFTP, OpenSSH's sftp-server uses linkat(AT_SYMLINK_FOLLOW) so the link resolves the symlink chain (matches ln -L). follow_symlinks=False is ln -P --physical ("make hard links directly to symbolic links"); SFTP can't express that, so it falls back to a shell ln -P -- <src> <dst> invocation via conn.run with shlex.join argv -- same pattern as ssh_cp / ssh_mv's shell fallbacks, doesn't require ln in command_allowlist. Both src and dst route through path policy. Path-policy weakening for -P (documented in the SKILL): canonicalizing src would resolve the symlink we want to point at, defeating -P's point. Compromise -- canonicalize the parent of src (must exist + be allowlisted) and lstat confirms src exists in that dir; restricted-paths check still applies to the constructed full path. The check is "the symlink lives in an allowed dir," not "everywhere this symlink could ever point is allowed." Defensive details: -P mode rejects directory-only src up front, surfaces clean ValueError on lstat failure (not raw SFTPError), raises WriteError on shell non-zero exit. argv-quoted via shlex.join -- no string interpolation into the shell command. POSIX-only via require_posix. Existing dst raises (no -f / force; use ssh_delete first). No -s (symbolic) for v1 -- explicit ask was hard links. 7 regression tests in tests/test_link.py covering both modes' happy paths, dst-exists propagation, -P lstat-missing -> ValueError, -P shell failure -> WriteError, directory-only src rejection, Windows PlatformNotSupported. TOOLS.md row added; skills/ssh-link/SKILL.md authored with explicit "when to use hard links vs ssh_cp" + the path-policy weakening callout. Catalog: 74 tools across 9 groups. Suite: 758 unit pass (up from 750), 1 skipped. Ruff clean; mypy strict adds zero new errors. Audit pass earlier in the session confirmed no command-injection vector (argv-list construction, no f-string-into-shell anywhere new).
2026-04-25 Per-host two-layer memory: operator baseline + agent sidecar (INC-055). Operator wanted CLAUDE.md-style persistent host memory the LLM could write itself, so durable lessons survive across sessions. Shipped a two-layer model: (1) operator notes -- notes = """...""" field on [hosts.<alias>] in hosts.toml, hard-rule baseline READ-ONLY to the agent ("never install apache2 here", ownership, on-call routing). (2) agent notes -- markdown sidecar at <SSH_HOST_NOTES_DIR>/<alias>.md (default notes/<alias>.md), READ-WRITE by the LLM. Three new tools: ssh_host_notes(host) (safe + read + group:host) returns both layers in one call -- {operator_notes, agent_notes, agent_notes_path, has_notes}; ssh_host_notes_append(host, entry) (low-access) appends a ## <UTC iso8601>\n<entry> block (creates the file with a header on first call); ssh_host_notes_set(host, content) (low-access) replaces the sidecar verbatim for consolidation or reset (empty string allowed -- clears to 0 bytes). All writes atomic via temp+os.replace; capped at SSH_HOST_NOTES_MAX_BYTES (default 256 KiB; the append error tells the LLM to consolidate via _set when approaching). Aliases validated against ^[A-Za-z0-9._-]+$ before being concatenated into a sidecar filename -- defense-in-depth against any future code path that bypasses resolve_host (which already filters). ssh_host_list's HostListEntry carries has_notes: bool true when EITHER layer is non-empty (one stat per host, cheap). Two new settings: SSH_HOST_NOTES_DIR (None disables the agent layer; operator layer remains) and SSH_HOST_NOTES_MAX_BYTES. .env.example + hosts.toml.example + .gitignore updated (sidecars excluded from source control). New HostNotesResult + HostNotesWriteResult models with extra="forbid". Three SKILL.md authored (ssh-host-notes, ssh-host-notes-append, ssh-host-notes-set) -- the append skill explicitly lists what NOT to record (re-derivable facts, ephemeral state, secrets, long verbatim output) so the sidecar stays useful. Misunderstanding pivot: first pass shipped operator-write / agent-read only -- a notes field on HostPolicy plus a read-only ssh_host_notes tool. Operator clarified they wanted the agent to write its own notes; the operator's role is to seed hard rules, not maintain everything. Pivoted to the two-layer model that keeps the first pass useful (it's now Layer 1) and adds the missing write side as Layer 2. 20 regression tests in tests/test_host_notes.py: operator-layer parsing, has_notes across both layers (operator only, agent only, both, whitespace, 0-byte sidecar, dir disabled), ssh_host_notes returning both layers cleanly, append creates header on first call + preserves history + rejects empty entries + enforces cap + raises when dir disabled + creates parent dir, set writes verbatim + replaces existing + empty clears to 0 bytes + enforces cap, unknown-alias propagation through resolve_host for all three tools. Catalog: 73 tools across 9 groups (up from 71). Suite: 750 unit pass (up from 727), 1 skipped. Ruff + mypy strict clean on touched files. Three SKILL traps hit + fixed: triple-quote inside a Python docstring (SyntaxError), em-dash in a SKILL front-matter description (ASCII-guard test), unescaped | in an INCIDENTS table row (markdownlint column count).
2026-04-25 ssh_upload / ssh_deploy accept content_text; ssh_exec_run framing sharpened against heredoc misuse (INC-054). Operator reported long opaque ssh_exec_run calls in tool-output transcripts that were probably just cat > path <<EOF-style file writes -- the LLM was reaching for ssh_exec_run because (a) the discouraging language for file writes was buried as one bullet in a 14-row mapping table, and (b) ssh_upload / ssh_deploy required content_base64, which is real friction for plain-text configs. Two parallel fixes: (1) Sharpened framing -- added an explicit "NEVER use ssh_exec_run for file writes" section to both the tool docstring and skills/ssh-exec-run/SKILL.md, called out the four most common patterns by name (cat > path <<EOF, tee path, echo > path, printf > path), and expanded the cheat-sheet from 14 rows to ~22 (every file-write pattern → ssh_upload(content_text=...); added missing entries for ssh_broadcast, ssh_transfer, ssh_host_network, ssh_user_info, ssh_file_hash, ssh_systemctl_*, ssh_journalctl from INC-052 / earlier sprints). (2) Removed encoding friction -- added content_text: str | None = None as a sibling to content_base64 on both ssh_upload and ssh_deploy. Plain UTF-8 (configs, scripts, code) goes via content_text; binaries keep using content_base64. New shared helper _resolve_upload_payload validates the exactly-one-of contract; empty string is a deliberate valid input (writes a zero-byte file) so the validator uses is not None, not truthiness. Existing callers passing content_base64 positionally keep working -- parameter moved from required to optional but stayed in the same position. 7 new tests in tests/test_upload_payload.py covering plain-text encoding, unicode round-trip, empty string, binary round-trip, both-set / neither-set rejection, malformed base64. TOOLS.md rows for both tools updated with the new payload semantics + explicit "use this instead of ssh_exec_run for cat > path <<EOF / tee / echo > path / printf > path" callout. SKILL.md files for both tools rewritten with both-payload examples. Suite: 727 unit pass (up from 720), 1 skipped; ruff clean on touched files; mypy strict adds no new errors.
2026-04-17 INC-052 step 2 — ssh_transfer + ssh_user_info + ssh_host_network + ssh_host_info extension + audit-log README polish. Closed out the remaining port items from the upstream comparison (analyze/ssh-server-mcp-main/). ssh_transfer (tools/multi_host_tools.py, low-access + group:file-ops) streams a file between two remotes via SFTP channels on both connections -- 256 KiB chunks, atomic write on dst (temp + posix_rename), cleanup-on-failure. Both endpoints route through canonicalize_and_check + check_not_restricted independently so per-host path policy applies; size cap from SSH_UPLOAD_MAX_FILE_BYTES; same-host call rejected (use ssh_cp); cross-platform via SFTP. Throughput bottlenecks at the slower of (src→MCP) and (MCP→dst) -- documented so operators know to use direct scp via ssh_exec_run when A and B already trust each other and want gigabit. 7 tests with a fake SFTP harness (tests/test_transfer.py) covering pre-flight rejection, overwrite gating, size cap, atomic temp+rename, mid-transfer cleanup, throughput field. ssh_user_info (tools/host_tools.py, safe + read + group:host) returns structured /etc/passwd row + group memberships via getent passwd + id -Gn + id -gn in parallel; username=None resolves the SSH user via id -un; username regex-validated (POSIX 3.437) before reaching remote argv; no sudo. Dropped the upstream's list-all-users action -- structured per-user lookup is the win. ssh_host_network (tools/host_tools.py, safe + read + group:host) parses ip -j addr show into {name, state, mac, addresses[]} per interface; kernel-internal fields dropped; busybox hosts without iproute2 get [] instead of a raise. ssh_host_info extended with cpu_model (parses /proc/cpuinfo model name / ARM Model/Hardware fallback), cpu_count (parses nproc), hostname_fqdn (parses hostname -f) -- three new probes added to the existing asyncio.gather(return_exceptions=True) so a missing one doesn't lose the siblings. 14 parser unit tests in tests/test_host_extensions.py covering Intel/AMD/ARM cpuinfo, ip-json happy-path + garbage-tolerance, passwd-line parsing. README audit-log section (README.md:438-486) documents the ssh_mcp.audit JSON-line schema and gives four jq recipes (errors-last-hour, slowest-dangerous-calls, count-by-tool, trace-by-correlation_id); replaces the upstream's ssh_get_logs audit-query tool per the INC-052 design-no rationale (audit flows one-way to operators, never back to the agent). Two new result models HostNetworkResult / UserInfoResult / TransferResult + helpers NetworkInterfaceAddress / NetworkInterfaceEntry in models/results.py all with extra="forbid" per INC-046. INC-052 status → resolved. ssh_snapshot still deferred (runbook-first); ssh_get_logs + port-forwarding + local-FS SFTP + ssh_cron design-nos hold. Catalog: 71 tools across 9 groups (up from 68). Suite: 720 unit pass (up from 683), 1 skipped. Ruff clean on touched files; mypy strict adds one pre-existing-pattern attr-defined on asyncssh.sftp.FX_NO_SUCH_FILE mirroring low_access_tools.py:158.
2026-04-17 ssh_broadcast — fan-out exec across pre-configured hosts (INC-052 step 1). First port from the upstream tool-surface comparison (analyze/ssh-server-mcp-main/ — TypeScript SSH-MCP server). New dangerous + group:exec tool runs the same command on multiple hosts in parallel, returns a structured per-host result. Hard cap of 50 hosts per call; aliases deduplicated; per-host command_allowlist and platform checked independently so one host's CommandNotAllowed / PlatformNotSupported / transport failure does NOT abort the others. Pre-flight validation distinguishes caller errors (empty list, over-cap, HostNotAllowed/HostBlocked aliases — RAISE up front) from transient per-host failures (captured in the errors map). Result shape {command, results{alias→ExecResult}, succeeded[], failed[], errors{alias→exception-class}, elapsed_ms} — command echoed because the audit decorator records host="?" for fan-out tools, so the result body is the durable record of what ran where. New module tools/multi_host_tools.py (sibling to exec_tools.py; future ssh_transfer will land here too). New result model BroadcastResult with extra="forbid" per INC-046. 13 regression tests in tests/test_broadcast.py: empty/over-cap/typo/blocked rejection, dedup, all-succeed happy path, per-host allowlist denial, Windows host PlatformNotSupported, generic transport-error catch-all, timeout-as-failed, non-zero-exit-as-failed, command echo, unique-acquire pin. INC-045 trap dodged — Context is a runtime import in tools/** because FastMCP's @tool calls get_type_hints() at registration. INC-052 status flipped to partial (broadcast shipped; ssh_transfer + ssh_user_info + ssh_host_info extension still pending; design-no for ssh_get_logs + port forwarding stands). TOOLS.md row added; skills/ssh-broadcast/SKILL.md authored (ASCII-only per test_skills_ascii.py). Catalog: 68 tools, 9 groups. Suite: 683 unit pass (up from 670), 1 skipped; ruff + mypy strict clean on touched files.
2026-04-17 Host-catalog introspection + runtime policy reload. Two new group:host tools:
- ssh_host_list (safe tier) — enumerate aliases currently loaded from hosts.toml + SSH_HOSTS_ALLOWLIST. Returns {alias, hostname, port, platform, user, auth_method} — credentials never exposed. Unblocks LLM self-discovery of the fleet without the operator pre-listing aliases in prompts.
- ssh_host_reload (low-access tier, gated by ALLOW_LOW_ACCESS_TOOLS=true) — re-read SSH_HOSTS_FILE from disk and swap the in-memory policy atomically. Returns {loaded, source, added, removed, changed} diff. Validates new file BEFORE swap — parse/validation failure keeps the existing fleet intact (no brick-by-bad-config). Does NOT invalidate pooled connections; live sessions retain original policy until keepalive drops them. Typed accessor hosts_from(ctx) added to tools/_context.py alongside the existing pool_from / settings_from / known_hosts_from — raw lifespan_context["hosts"] access eliminated from tools. Catalog: 67 tools, 9 groups. Suite: 664 unit pass (up from 646), 1 skipped.
2026-04-17 Windows file-hash path fixed (INC-028 lineage). _hash_windows was failing against real Windows OpenSSH: Get-FileHash emits Write-Progress records that OpenSSH-for-Windows serializes as CLIXML (#< CLIXML <Objs…) into stderr, and the script ending on an expression caused exit_status=None channel closes. Three-part fix: (1) prepend $ProgressPreference='SilentlyContinue'; to silence progress records; (2) append ;exit 0 to force an exit-status channel request; (3) shape-validating fallback — accept exit_status ∈ {0, None} IFF the digest hex matches the expected length for the requested algorithm. E2E test_file_hash[test_windows11] now green. 5 unit tests updated to use algorithm-length-appropriate fake digests (was 12-char stub for all); 2 new assertions pin the $ProgressPreference line + exit 0 trailer. Suite: 664 unit, 93 e2e pass (up from 72 pre-Win11).
2026-04-17 systemctl safe-tier domain (8 tools). New group:systemctl — ssh_systemctl_status, ssh_systemctl_is_active, ssh_systemctl_is_enabled, ssh_systemctl_is_failed, ssh_systemctl_list_units, ssh_systemctl_show, ssh_systemctl_cat, ssh_journalctl. All tagged {safe, read, group:systemctl}, version="1.0". Result models in models/systemctl.py. Lifecycle ops (start/stop/restart/reload/enable/disable/daemon-reload) intentionally documented as ssh_sudo_exec systemctl … examples in the runbook rather than first-class tools — they require root on stock hosts and gating them through the sudo tier avoids a false-low-access ergonomic trap. 8 per-tool skills + 1 consolidated runbook at runbooks/ssh-systemd-diagnostics/SKILL.md. _JOURNALCTL_TIME_RE initially copied docker's s/m/h-only posture, caught by the e2e run (since="30d" → ValueError) — tightened to match systemd.time(7): s/m/h/d/w/M/y. Catalog: 65 tools, 9 groups. Suite: 162 unit pass (systemctl), 641 full unit + 1 skipped, 16 e2e pass (8 tests × 2 reachable Linux hosts) + 8 skipped (windows11 unreachable/platform).
2026-04-30 Resolved (Sprint 3, v1.4.0): Pydantic model_config = ConfigDict(extra="forbid") policy now uniform. All 9 systemctl result models in models/systemctl.py now carry _RESULT_MODEL_CONFIG = ConfigDict(extra="forbid") — matching the pattern from models/results.py. Every models/*.py result class is now strict; no per-model exceptions remain. See ADR-0025 in DECISIONS.md.
2026-04-17 Deferred: SSH_RUNBOOKS_DIR provider wiring. lifespan.py mounts SkillsDirectoryProvider on SSH_SKILLS_DIR + SSH_RUNBOOKS_DIR, BUT the latter is wired into _mount_skills — confirm against current code before closing. If already wired, mark this closed retroactively; if not, add a second provider or repoint.
2026-04-17 Deferred: Widen tests/test_skills_ascii.py to cover runbooks/*/SKILL.md. The ASCII-only guard scans skills/ but not runbooks/. Would have caught a pre-review em-dash in the new runbook automatically. Also flags tests/test_systemctl_tools.py:364 which has a minor em-dash in a comment (non-blocking today; would need attention if the policy extends to test comments).
2026-04-17 _canonicalize_posix must_exist=True actually enforces existence now. Surfaced by the full e2e run after the audit-redaction landing: test_cp_mv_edit_patch_deploy mvs a file then expects ssh_sftp_stat on the now-missing source to raise PathNotAllowed, but it raised raw asyncssh.sftp.SFTPNoSuchFile instead. Root cause: _canonicalize_posix ran realpath with no flags when must_exist=True (only added -m for must_exist=False); GNU realpath's default mode requires the parent to exist but tolerates a missing leaf, so the canonicalize step silently succeeded and the missing-file signal leaked out of the next SFTP op as a transport error. Fix is one line: argv.append("-e" if must_exist else "-m"). With -e (canonicalize-existing), every component must exist — exactly the contract the kwarg name advertises. 5 unit-test fixture lines updated to the new argv shape. Suite: 475 unit + 56 e2e pass, 34 e2e skipped (sudo-gated, Windows-no-docker).
2026-04-17 Argv-secret redaction wired into audit. Closed the long-standing "Redact --password=* / --token=* in telemetry and audit" line under ongoing/cross-cutting. Telemetry side was already covered by exclusion (spans don't attach argv per the redaction posture in telemetry.py module docstring); audit side had a command_hash field that nothing populated AND, when populated, would have stably hashed the raw secret value. Two-part fix: (1) new redact_command_string helper (regex-based, length-preserving, mirrors redact_argv semantics for raw-string commands where shlex.split would fail on partial pipelines); (2) @audited wrapper extracts command: str (ssh_exec_run / ssh_exec_run_streaming / ssh_sudo_exec / ssh_docker_exec) and args: list[str] (ssh_docker_run) into the audit line, while script: bodies (ssh_exec_script) stay deliberately out of the capture per the tool's stdin-only contract. record() itself redacts BEFORE hashing AND strips the :N length suffix via _REDACTED_LEN_SUFFIX_RE so two --password=X invocations with different X produce the same command_hash — dedup-by-shape instead of a stable per-secret fingerprint that would be trivially rainbow-tableable for guessable passwords. 10 new tests pin both helpers and the wiring (including test_record_redacts_secret_flags_before_hashing which asserts hash equality across two different secret values). Suite: 475 unit pass (up from 465), 1 skipped; ruff clean.
2026-04-17 Phase 5 telemetry wiring landed. Closed the last open item on Phase 5: telemetry.span() now actually wraps the three transport call sites it was always supposed to (connection.py:_open_single, exec.py:run + run_streaming, path_policy.py:canonicalize_and_check). Span names match the DESIGN.md naming (ssh.connect, ssh.exec, path.canonicalize); attributes attach host, port, user, auth method, proxy hop count, exit code, duration, timed-out flag, and stdout/stderr byte counts — but never argv strings, path content, or auth secrets, per the redaction posture stated in telemetry.py module docstring. Without OTel installed everything degrades to the existing _NoopSpan so this is a zero-cost addition for users who don't opt into [telemetry]. Three new regression tests in test_telemetry.py lock the wiring: test_exec_run_opens_ssh_exec_span and test_path_policy_opens_canonicalize_span monkeypatch the span import in each consumer module, drive the call site, and assert both the open-time attributes AND the absence of argv/path content in any captured value (this is how a future "let's just attach args for debugging" regression gets caught at PR time); test_connection_module_imports_span is a static binding check because open_connection requires a live asyncssh handshake to exercise end-to-end. Also closed two pre-existing INC-046 leftovers in tests/e2e/: dict-style accesses on HashResult/PingResult/DownloadResult returns (now attribute access). Added uv.lock to .gitignore — the file churns per-machine and was generating noise across every shell that ran a uv command. Suite: 465 unit pass (up from 462), 1 skipped; ruff + mypy strict clean on all touched files.
2026-04-17 SSH_CONFIG_FILE wired through (INC-051, ext: classfang/ssh-mcp-server#22). Closed a doc/code drift uncovered while cross-checking an upstream feature request: the SSH_CONFIG_FILE: Path | None = None field had been declared in config.py, surfaced in .env.example, promised by AGENTS.md:594, and echoed in DESIGN.md:451 — but the only consumer in src/ was the test conftest cleaning it out of the environment; _open_single never passed anything to asyncssh.connect(config=...). Operators setting the field saw zero effect — ProxyJump, IdentityFile, host-alias HostName resolution, and Ciphers/MACs/KexAlgorithms overrides from their personal SSH config were silently ignored. Fix is three small edits: (1) _open_single appends kwargs["config"] = [str(settings.SSH_CONFIG_FILE.expanduser())] when set — expanduser() called explicitly because pydantic's Path coercion does not, and asyncssh treats config-file values as fallbacks for kwargs not explicitly passed so our explicit host/port/username/known_hosts still win; (2) config.py:107-118 adds an _empty_path_to_none field validator so SSH_CONFIG_FILE= (blank in .env.example) doesn't smuggle a Path("") through (truthy, points at CWD); (3) lifespan.py:236-249 emits ssh_config: honoring <abs-path> when set+exists or WARNING when set+missing — asyncssh tolerates a missing config file silently, which makes "I set the env var but ProxyJump still doesn't apply" debug sessions awful. Five regression tests in test_ssh_config_file.py pin the contract: config kwarg appears when set, absent when unset, ~ expanded before forwarding, blank env normalized to None, whitespace-only env normalized to None — pattern is monkeypatch asyncssh.connect with a fake that captures kwargs and raises _StopHere to abort before networking. README quickstart now has an "Inheriting from ~/.ssh/config" subsection right after the hosts.toml writeup so first-time operators with a populated SSH config see the option immediately; .env.example knob got a 6-line block-comment with the precedence rule; AGENTS.md §1.3 line tightened with the precedence + use-case detail. Suite: 462 unit pass (up from 457), 1 skipped; ruff + mypy strict clean on all touched files.
2026-04-17 Release-prep incident sweep (INC-035 → INC-048, plus INC-027 superseded). Closed 14 INCIDENTS entries from the project review in one window, including 7 that surfaced during code review of the e2e-suite landing. Highlights: INC-035 awaited cancelled pump tasks in ssh/exec.py (no more Task exception was never retrieved on timeout); INC-036 snapshotted ConnectionPool._reap_once() so concurrent acquire can't trigger RuntimeError: dictionary changed size; INC-037 replaced the <received> literal in HostKeyMismatch with explicit "asyncssh did not expose the received key" wording + the ssh-keyscan recovery path; INC-038 added return_exceptions=True to ssh_host_info/_alerts asyncio.gather so one missing probe (no uptime, restricted /proc) doesn't lose the others; INC-039 narrowed _atomic_write except Exception: to (asyncssh.Error, OSError) so CancelledError/MemoryError propagate; INC-040 typed _docker_prefix/_compose_prefix from Any to HostPolicy/Settings under TYPE_CHECKING; INC-041 dropped unused tenacity dep; INC-042 introduced a LifespanContext TypedDict + single cast so all 4 ctx accessors drop their # type: ignore[no-any-return]; INC-043 split tools/docker_tools.py from 1020 lines into a docker/ subpackage (_helpers.py 360, read_tools.py 361, lifecycle_tools.py 155, dangerous_tools.py 228) with a 103-line facade preserving all historical imports — two test files updated to monkeypatch the correct submodule; INC-045 ran a 104-finding ruff cleanup then expanded select with ASYNC/PERF/PT/PLE/TCH (waiving ASYNC109 because every tool intentionally exposes timeout= for per-call MCP override); the ruff --unsafe-fixes for TCH002 moved 11 from fastmcp import Context / from pathlib import Path imports under TYPE_CHECKING and broke 131 tests because FastMCP's @tool calls get_type_hints() at registration and pydantic's model_rebuild() does the same on field annotations — restored as runtime imports and pinned via per-file ["TC001", "TC002"] ignores on tools/** and models/**; INC-046 (both steps) added ConfigDict(extra="forbid") to all 13 result models (typos at construction now raise ValidationError) AND rewired 22 tools to return their typed BaseModel directly so MCP clients see real schemas in tools/list instead of generic object (~60 test assertions converted from result["foo"] to result.foo); legitimately-merged-dict tools (every ssh_docker_*, ssh_shell_exec, ssh_host_alerts, ssh_known_hosts_verify, ssh_session_*, bimodal ssh_delete_folder, extending ssh_deploy) deliberately stay as dict[str, Any] with rationale captured at the call site; INC-047 introduced ShellSession.exec_scope() async context manager + set_cwd() that asserts self.lock.locked() at the write site, so the "caller forgot to acquire the lock" regression class is now eliminated by construction (3 new regression tests; INC-027 closes as n/a (superseded) because the bypass-the-lock failure mode it worried about is unreachable now); INC-048 type-checks both kwargs["host"] and args[0] in audit.py so a misordered tool signature drops to "?" instead of smearing a __repr__ into the audit stream; INC-028 unblocked Windows ssh_file_hash via PowerShell -EncodedCommand (base64-UTF16LE of a Get-FileHash script with ''-escaped LiteralPath) — sidesteps every cmd.exe / PowerShell shell-quoting corner the prior shlex.join attempt couldn't reach. Suite: 457 unit pass (up from 448), 1 skipped; ruff clean (0 findings under the expanded ruleset). Open: only INC-044 (CI / pre-commit / .python-version scaffolding, deliberately deferred).
2026-04-17 tests/e2e/ suite + Windows SFTP realpath fix (INC-034). New tests/e2e/ suite drives every registered tool against the real hosts in hosts.toml, with per-alias parametrization, session-scoped fixtures for Settings / pool / hosts.toml loading, and TCP reachability probes so unreachable hosts skip rather than fail. Six test modules: test_e2e_real_hosts.py (core tools — ping / host_info / sftp / file-ops / exec / sessions / shell, 15 test functions), test_e2e_docker.py (full docker + compose lifecycle, auto-skip via docker version / docker compose version probes), test_e2e_sudo.py (gated on SSH_E2E_SUDO_PASSWORD so accidental runs can't mutate production), test_e2e_path_policy.py (allowlist + restricted_paths enforcement — each test rebuilds a narrow ctx because policy.path_allowlist ∪ settings.SSH_PATH_ALLOWLIST with either containing "*" would mask confinement). New e2e pytest marker registered; suite covers all 57 registered tools across 90 parametrized cases; 62 pass + 13 skipped without sudo, opt-in sudo bumps to 83 pass + 7 skipped / 90 total. First full-catalog e2e run against test_windows11 surfaced INC-034 (High): OpenSSH-for-Windows returns SFTP realpath results in Cygwin form (C:\Users → /C:/Users), which _is_windows_absolute rejects — every Windows SFTP path failed with PathNotAllowed: canonicalized path is not absolute. Fixed in _canonicalize_windows by stripping the single leading / when the next two chars form a drive prefix (C:/ or C:\); predicate is tight enough to leave UNC paths (//host/share) alone. Post-merge code review flagged that the fix shipped without a regression unit test; closed by test_sftp_realpath_cygwin_form_is_normalized + test_unc_realpath_is_not_stripped which pin both the Cygwin normalization and the UNC pass-through contract. Also this pass: skill/runbook count floors in test_skills_ascii.py tightened (23 → 50, 3 → 7) so a mass-accidental-deletion trips CI instead of sliding through; config.py now explains why there's no symmetric SSH_ENABLE_SKILLS toggle (per-tool skills near-free, runbooks heavy — asymmetry deliberate). Suite: 448 unit pass (up from 446), 1 skipped; e2e suite 62 pass + 13 skipped without sudo.
2026-04-16 Skills / runbooks directory split + 5 new runbooks. skills/ now holds only per-tool SKILL.md files (57, one per registered tool); runbooks/ holds multi-tool workflow procedures (8 total). Two separate SkillsDirectoryProvider instances mount the directories; new SSH_ENABLE_RUNBOOKS: bool = True setting skips the runbooks mount for tool-execution-only assistants. Lifespan log distinguishes the two (mounted skills provider at ... vs mounted runbooks provider at ...). Three existing runbooks moved out of skills/: ssh-incident-response, ssh-docker-incident-response, ssh-verify-signature. Five new runbooks added: ssh-deploy-verify (upload + hash-verify + compose_up + log-tail + .bak-<ts> rollback), ssh-host-healthcheck (identity + alerts + disk + processes + uptime → green/yellow/red), ssh-disk-cleanup (find before prune; branches for logs / Docker / app data; never volume-prune from LLM), ssh-integrity-audit (pinning + hash drift + signature verify + SUID delta), ssh-container-rollout (standalone docker run rollout with State.Health verification + image rollback). Cross-refs fixed: skills → runbooks use ../../runbooks/<name>/, runbooks → per-tool skills use ../../skills/<name>/. Test suite: test_skills_ascii.py scans both directories; new test_runbooks_directory_exists guard. Suite: 446 pass, 7 skipped.
2026-04-16 Security findings consolidated into INCIDENTS.md. Central append-only ledger with stable INC-NNN IDs replaces the previous scatter across progress entries, ADR context blocks, and inline code comments. 33 entries migrated: 20 internal findings (INC-003..INC-014, INC-021..INC-028), 5 external-project issue scans, 2 external-feedback ADR-triggers, 5 post-merge code reviews, 1 commit-message-style review. Status index at the top of INCIDENTS.md gives the at-a-glance view; detailed per-entry blocks below with refs to fix commit, tests, ADRs. Code comments across src/ + tests/ migrated from legacy IDs to INC-NNN references. README Architecture section updated.
2026-04-16 Post-review fixes for file-hash + docker events/volumes. B1 (blocking): ssh_file_hash Windows branch used POSIX shlex.join quoting which is incompatible with Windows OpenSSH's cmd.exe / PowerShell host -- the '"'"' single-quote escape is POSIX-shell-only. Windows support gated via require_posix; _hash_windows / _WINDOWS_HASH_ALGO removed; SKILL.md + docstring updated; INC-028 filed as open with the -EncodedCommand fix-forward path. 2 Windows-argv tests replaced by a single PlatformNotSupported assertion. I1: _DOCKER_TIME_RE dropped the d unit -- Go's time.ParseDuration (which docker events --since feeds into) only accepts s/m/h, and accepting d would have routed to a confusing time: unknown unit "d" daemon error. Two d-smuggling cases (7d, 1d2h) added to the bad-since parametrize; 7d removed from the good-time list. I2: ssh_docker_volumes docstring now names the empty volumes on any non-zero exit_code semantics explicitly so LLMs don't silently read "no such volume" as "genuinely empty". Default since for ssh_docker_events: 10m -> 1h -- operators paged mid-incident weren't seeing the trigger. Also explicit "--filter" not in argv assertion in the default-argv test to lock the no-filters-means-no-filter-token invariant. Minor: _stat_size now catches asyncssh.Error (base class) not just SFTPError so transport failures also degrade to the -1 sentinel; unused AsyncMock import removed. Suite: 440 pass, 7 skipped.
2026-04-16 Two new read-tier Docker tools: ssh_docker_events + ssh_docker_volumes. Fill the two biggest gaps the docker incident-response runbook exposed. ssh_docker_events runs docker events --since <since> --until <until> --format '{{json .}}' over a bounded time window (we never pass unbounded events — would hang until SSH_COMMAND_TIMEOUT); time-anchor regex accepts relative (10m, 24h30m), Unix epoch, RFC3339, and now; filters: list[str] accepts conservative KEY=VALUE expressions validated by a regex. ssh_docker_volumes combines list + inspect: without name runs volume ls --format '{{json .}}', with name runs volume inspect -- <name> (parity with ssh_docker_inspect). Closes the "don't blind-prune volumes from an LLM turn" gap — operators can now enumerate + inspect before any prune(scope='volume') decision. 29 new tests in test_docker_events_volumes.py covering good/bad time formats, injection-attempt filters, argv shape (default + with-filters), NDJSON parsing, volume ls vs inspect argv, non-zero-exit inspect. Docker incident-response runbook updated to reference both in §2 (events for "what just happened") and §6 (volumes before prune). Catalog: 57 tools, docker group 26, suite 443 pass, 7 skipped.
2026-04-16 Docker incident-response runbook (runbooks/ssh-docker-incident-response/SKILL.md) parallel to ssh-incident-response. Eight-section workflow: full ps --all inventory, host-level resource triage (disk_usage + alerts + stats), inspect-first root-cause for failing containers (exit-code decoding: 137 OOM, 143 SIGTERM, restart-count/healthcheck semantics), running-but-broken diagnosis (top, network-mode, saturated-resources), compose-stack failure path with compose_ps + service-scoped compose_logs, disk-pressure prune path with an explicit "don't volume-prune from an LLM turn" boundary, tiered recovery actions (low-access restart vs dangerous compose up/down), escalation triggers (unhealthy-after-restart, repeated identical exit codes, image-drift flag that cross-refs the signature-verify runbook). Read-only up to Section 5; low-access + dangerous explicitly gated. Podman-agnostic (SSH_DOCKER_CMD). README runbooks section lists all three canonical runbooks. Suite: 412 pass, 7 skipped.
2026-04-16 Signature-verification runbook (runbooks/ssh-verify-signature/SKILL.md) instead of a new tool. Covers GPG / cosign / minisign as a workflow via ssh_exec_run + per-host command_allowlist -- no crypto-library dep, no trust-store management in ssh-mcp, no one-size-fits-all wrapper. Explicit responsibility boundary: CI/CD signs BEFORE artifact reaches the target; ssh-mcp does second-line verify of deployed artifacts only. Anti-pattern called out: distributing pubkey + artifact + signature through the same channel. Per-tool gotchas documented (gpg pinentry hang, Good signature from <unexpected uid>, cosign keyless needs network to Rekor). The ssh_file_hash SKILL's security note rewritten to point here with a clean integrity-vs-authenticity boundary statement. README runbooks section updated. Declined to ship ssh_file_verify_signature per ADR-0009 (workflow tools out, skills in). Suite: 411 pass, 7 skipped.
2026-04-16 ssh_file_hash: standalone read-tier tool for transfer-verification / drift-detection. Computes md5 / sha1 / sha256 / sha512 of a remote file; returns lowercase hex digest + byte size. POSIX runs <algo>sum -- <canonical> (coreutils); Windows runs powershell -NoProfile -NonInteractive -Command "(Get-FileHash -Algorithm <ALGO> -LiteralPath '<path>').Hash" with PowerShell single-quote escape (' -> ''). Kept as standalone two-step verify flow rather than auto-integrated into ssh_upload / ssh_deploy / ssh_docker_cp per operator preference -- it's a debug / manual-verify helper. Path goes through canonicalize_and_check + restricted_paths like every other sftp-read tool. 16 new tests in test_file_hash.py covering invalid algorithm rejection, all four POSIX binaries, path-with-spaces parsing, non-zero-exit → HashError, unparseable-digest → HashError, uppercase-digest lowercased, Windows -Algorithm <ALGO> verification, single-quote escape via shlex.split roundtrip, check_not_restricted invocation (happy-path test catches missing-import class of bug). Catalog: 55 tools, suite 406 pass, 7 skipped.
2026-04-16 Post-review fixes for ssh_docker_cp (review B1 + I1 + I2 + M1 + coverage gap). B1 (blocking): effective_restricted_paths and check_not_restricted were used but not imported -- every real call would have died with NameError. The pre-flight tests passed because they short-circuit on _validate_name / direction validator before reaching the path-policy block. Imports added. I1: rewrote the if/elif over direction to if/else so a future fourth direction added without updating the validator fails fast at the branch instead of silently leaving argv unbound. I2: comment at the first pool.acquire confirms keyed-pool semantics -- the _run_docker re-acquire returns the cached connection, one TCP/SSH session, two channels. M1: ssh_docker_top docstring now explains why the metachar check runs on the raw input BEFORE shlex.split (\n-as-whitespace silently smuggles redirects past per-token checks). Coverage gap: added two happy-path tests in tests/test_docker_top_cp.py that monkeypatch canonicalize_and_check + _run_docker and assert the resulting argv shape for both directions, plus a third that asserts check_not_restricted is invoked. Verified the new tests would have caught B1 by temporarily stripping the imports and observing the expected NameError. M3: to_container added to the bad-container-name parametrize. Suite: 390 pass, 7 skipped.
2026-04-16 (latest) Two new docker tools: ssh_docker_top + ssh_docker_cp. ssh_docker_top (read tier) runs docker top <container> with optional ps_options argv suffix; shell metacharacters in the raw option string are rejected before shlex.split so \n-split tricks can't smuggle redirects. Output is plain ps-style text in stdout (docker has no JSON format for top). ssh_docker_cp (low-access) does bidirectional docker cp with explicit direction: Literal["from_container", "to_container"]; host-side path goes through canonicalize_and_check + restricted_paths like ssh_cp; container-side path intentionally NOT policy-checked (we don't manage policy inside containers). Docstring + SKILL document the compromised-image symlink-surprise caveat. 10 new input-validation tests, 2 new SKILL-ASCII tests, registration+tag assertions extended. Catalog: 54 tools, suite 386 pass, 7 skipped.
2026-04-15 (latest) Integration test keypair + known_hosts fixtures landed. The tests/integration/ suite is no longer a placeholder: conftest.py bootstraps an ephemeral ed25519 keypair under tests/integration/keys/ on first run (session-scoped, reused across sessions), session-pins the container's live host key via one known_hosts=None handshake, and builds a ConnectionPool bound to a real HostPolicy. Six real tests replace the old TCP-probe placeholder in test_integration_readonly.py: pool acquire, echo via exec_run, non-zero-exit-is-data, canonicalize_and_check in-scope + out-of-scope, SFTP listdir, pool reuse. All keys + known_hosts files are .gitignore'd. Unit suite still 374 green; integration skips cleanly when the container is down. README Testing section documents the pytest --collect-only + docker compose up + pytest -m integration flow and the rm tests/integration/known_hosts recovery for container recreate.
2026-04-15 (latest) MCP ToolAnnotations derived from tier tags. The MCP spec defaults readOnlyHint=false and destructiveHint=true, so every safe / read tool was surfacing as "destructive" in clients (Claude Desktop, MCP Inspector). _apply_mcp_annotations() runs once in the lifespan, iterates server._list_tools(), and maps our existing tag taxonomy onto ToolAnnotations: safe/read → read-only & non-destructive & idempotent; low-access file ops → additive (non-destructive) except ssh_delete* / ssh_docker_rm* / ssh_docker_prune which stay destructive; dangerous/sudo → destructive + openWorldHint=True everywhere. Five in-process regression tests in tests/test_mcp_annotations.py + a stdio round-trip smoke at scripts/check_annotations.py that spawns the server via the MCP Python SDK and dumps a tool / readOnly / destructive matrix. Suite: 374 pass, 2 skipped.
2026-04-15 (latest) Dropped Poetry for PEP 621 + hatchling + uv. pyproject.toml uses standard [project] metadata instead of [tool.poetry]; hatchling.build as PEP 517 backend (replaces poetry-core). Optional deps moved from [tool.poetry.group.*] to [project.optional-dependencies] (.[tasks], .[telemetry], .[dev]). [tool.poetry.scripts] → [project.scripts]. README installation section rewritten around uv sync / uv run / uvx --from . / pip install -e . with poetry references removed. hosts.toml + hosts.toml.example fingerprint-lookup snippets switched to uv run. Suite: 369 pass, 2 skipped; uvx --from . ssh-mcp verified end-to-end.
2026-04-15 (latest) Podman + arbitrary Docker-compatible CLI support. New SSH_DOCKER_CMD global setting (default docker) + per-host docker_cmd field in hosts.toml (mirrors the existing SSH_DOCKER_COMPOSE_CMD pattern). SSH_DOCKER_COMPOSE_CMD changed to empty default and derives from the docker cmd at runtime — so SSH_DOCKER_CMD=podman automatically yields podman compose without a second knob. All 22 Docker tools now route through _docker_prefix(policy, settings) / _compose_prefix(policy, settings) helpers; _run_docker takes a compose: bool flag and prepends the right prefix. _run_command_for_secret / _run_secret_cmd magic timeout=10 extracted to _SECRET_CMD_TIMEOUT_SECONDS named constants. Also fixed uvx --from . ssh-mcp crash: fastmcp dep now carries extras = ["tasks"] so ssh_exec_run_streaming's TaskConfig decorator validates at import time without manual --with tasks install. 13 new regression cases in tests/test_docker_cmd.py. Suite: 369 pass, 2 skipped.
2026-04-15 (latest) Post-fix findings INC-024 / INC-025 / INC-026 landed. Close two real gaps in the docker-run escalation deny-list plus a test cleanup:
- INC-024 (Medium): --mount source=/,target=/host bypassed the existing --volume=/: check. New _mount_source_is_host_root() parser decodes the KV value format (type=bind,source=/,target=...), handles both --mount X and --mount=X forms. posixpath.normpath catches //, /./, trailing-slash variants.
- INC-025 (Low): container-namespace join (--pid=container:victim, --network container:bar, etc. for all six namespace flags × both prefix + two-token forms) now rejected alongside the existing host match.
- INC-026 (Low): tautological (entered[0], exited[0]) == (entered[0], entered[0]) assertion removed from shell-session lock test; the exited[i] == entered[i] pair already pins the serialization invariant.
- INC-027 (Low): deferred per author note ("optional; only if ssh_shell_exec grows").
- 18 new parametrized regression cases in tests/test_docker_run_escalation.py. Suite: 356 pass, 2 skipped. Post-fix verification documented in INCIDENTS.md.
2026-04-15 (latest) Windows SSH targets — Scope 1 minus docker (ADR-0023). New HostPolicy.platform field (default posix, legacy linux/macos/bsd/darwin aliases normalize to posix, new windows option). require_posix() helper raises PlatformNotSupported on POSIX-assuming tools when target is Windows: ssh_host_info/_disk_usage/_processes/_alerts, ssh_exec_*, ssh_sudo_*, ssh_shell_open/_exec, all ssh_docker_*, ssh_cp. Error message names the missing capability and points at SFTP alternatives. Supported on Windows: SFTP file-ops (mkdir, delete, delete_folder via SFTP-walk, upload, edit, patch, deploy, mv without cross-fs fallback), SFTP reads (list, stat, download), ssh_find via SFTP-walk with fnmatch glob, plus ping/known_hosts_verify/session tools. Path policy platform-aware: canonicalize() routes to SFTP realpath (+ ntpath.normpath fallback) on Windows, prefix match case-insensitive + separator-agnostic. path_allowlist validator accepts C:\\... and C:/.... New tests in tests/test_windows_target.py (FakeConn + FakeSFTP shims, 29 cases). Suite: 338 pass, 2 skipped.
2026-04-15 (later) Hardening + ergonomics pass driven by internal review + cross-project issue scan (bvisible/mcp-ssh-manager#13, tufantunc/ssh-mcp#2/#42/#44, classfang/ssh-mcp-server#31):
1. INC-021 / INC-022 / INC-023 fixed: hook task tracking with backlog warning (hooks.py), ssh_docker_run rejects host-escape flags (--privileged, --cap-add, host namespace, host-root volume) by default behind ALLOW_DOCKER_PRIVILEGED (docker_tools.py), per-session asyncio.Lock on ShellSession acquired by ssh_shell_exec (shell_sessions.py + shell_tools.py). Full detail in INCIDENTS.md.
2. Docker list-tools include_labels flag: ssh_docker_ps/_images/_compose_ps strip Labels by default and rewrite stdout as compact NDJSON. OCI labels on common images blew the MCP output cap on hosts with 20+ containers.
3. TTY hint (#31 echo): ExecResult.hint populated when stderr matches is not a tty / must be run from a terminal. Tells the LLM to use batch-mode flags or ssh_exec_script. Defends our "no remote PTY" design choice without surprising the operator.
4. Risky-config hint (#13 echo): _warn_on_risky_config now warns when path_allowlist=["*"] and neither per-host restricted_paths nor env SSH_RESTRICTED_PATHS cover /etc/shadow, /etc/sudoers, /etc/ssh. Warning, not error -- some hosts genuinely don't have these.
5. ssh_exec_run last-resort docstring: tool docstring + SKILL.md now lead with "Last-resort tool" and a 14-row mapping table (mkdir -p ... -> ssh_mkdir, etc.) so the LLM stops reaching for ssh_exec_run when a dedicated wrapper exists.
6. BM25SearchTransform integration (revisits BACKLOG line 139): now reachable via SSH_ENABLE_BM25=true (default OFF), SSH_BM25_MAX_RESULTS=8, SSH_BM25_ALWAYS_VISIBLE=ssh_host_ping,ssh_host_info,ssh_session_list,ssh_shell_list. Replaces tools/list with search_tools + call_tool once 50+ schemas eat too much context per turn.
7. Tool catalog overview at startup: _log_tool_catalog emits tools registered: N total, M visible (after tier+group filters) plus per-tier and per-group counts. Helps operators verify their ALLOW_* / SSH_ENABLED_GROUPS actually does what they think.
8. Logging fix for fastmcp run: fastmcp run fastmcp.json skips our run_server.main(), so root logger stayed at WARNING and our INFO lines vanished into Python's lastResort handler. Lifespan now attaches a stderr handler to the ssh_mcp logger and respects LOG_LEVEL.
- Suite: 309 pass, 2 skipped. 52 tools registered in 8 groups.
2026-04-14 Phase 0 skeleton landed (pyproject, fastmcp.json, src layout, smoke tests).
2026-04-14 Phase 1a host configuration landed (models/policy.py, hosts.py, 14 loader tests pass). Python target bumped 3.11–3.13 → 3.11–3.14 (ADR-0013 unchanged; FastMCP 3 still the target).
2026-04-14 Phase 1b SSH transport landed: errors, known_hosts loader, agent fingerprint matching, connection opener with ProxyJump, keyed pool with idle reaper. app.py split off to break a circular import between server.py and tool modules.
2026-04-14 Phase 1c read-only tools landed: 11 tools registered (ping, host_info, disk_usage, processes, known_hosts_verify, session_list/stats, sftp_list/stat/download, find). Tests: 33 pass, 1 skipped (integration, needs live sshd). Phase 1 ✅.
2026-04-14 Host blocklist added (ADR-0015): SSH_HOSTS_BLOCKLIST env var, deny wins over allow. Resolution centralized in services/host_policy.py::resolve(); pool uses check_policy() for defense-in-depth. List env vars (SSH_HOSTS_ALLOWLIST, SSH_HOSTS_BLOCKLIST, SSH_PATH_ALLOWLIST, SSH_COMMAND_ALLOWLIST) now accept comma-separated strings in addition to JSON arrays. Tests: 48 pass, 1 skipped.
2026-04-14 Phase 2 low-access tier landed: services/path_policy.py (remote realpath + allowlist check), services/edit_service.py (structured edit + unidiff patch), 8 tools (mkdir, delete, delete_folder, cp, mv, upload, edit, patch) tagged {low-access, group:file-ops}. All tools SFTP-first with fixed-argv shell fallback; atomic writes via <path>.ssh-mcp-tmp.<hex> + posix_rename; caps enforced from Settings. Tests: 78 pass, 1 skipped. Phase 2 ✅.
2026-04-14 Phase 5 polish landed: tool groups wiring (SSH_ENABLED_GROUPS → per-group Visibility with ADR-0016 permissive default), audit log service + @audited decorator applied to all 8 low-access tools, FastMCP Skills provider mounted when skills/ exists (seed runbook at skills/ssh-incident-response/), telemetry helper (span() + redact_argv), README. BM25 search transform skipped (19 tools, under threshold). Tests: 95 pass, 1 skipped.
2026-04-15 Feature expansion beyond the original 5 phases (all driven by operator feedback):
1. Docker (22 tools) — ssh_docker_ps/logs/inspect/stats/images/compose_ps/compose_logs (read), ssh_docker_start/stop/restart/compose_start/compose_stop/compose_restart (low-access), ssh_docker_exec/run/pull/rm/rmi/prune/compose_up/compose_down/compose_pull (dangerous). All tagged group:docker; new group added to ALL_GROUPS. SSH_DOCKER_COMPOSE_CMD env var (default docker compose) for legacy-binary hosts. Log tools tighten default tail to 50 and default max_bytes to 64 KiB to protect LLM context; tool-level max_bytes param bounded [1 KiB, 10 MiB].
2. Smart Alerts (ssh_host_alerts) — read-only tool evaluating per-host thresholds (disk_use_percent_max, load_avg_1min_max, mem_free_percent_min, optional disk_mounts filter) configured in [hosts.<name>.alerts]. Runs df, /proc/loadavg, /proc/meminfo in parallel; returns structured breaches[] + metrics. No SMTP/Slack/webhook — caller (LLM or cron) decides what to do with the report.
3. Persistent Shell Sessions — 4 new tools (ssh_shell_open/exec/close/list) with group:shell. In-memory SessionRegistry tracks cwd across calls; wrap_command prefixes cd <cwd> + emits __SSHMCP_STATE__<pwd> sentinel for cwd-tracking. No real remote PTY. Four-gate story: tier dangerous + group shell + env ALLOW_PERSISTENT_SESSIONS (hides only open/exec, leaves list/close for drain) + per-host persistent_session = true|false.
4. Smart Deployment (ssh_deploy) — extends ssh_upload with automatic pre-deploy backup: if file exists and backup=True, SFTP posix_rename to <path>.bak-<UTC-iso8601> before writing tmp + rename-into-place.
5. Restricted Paths — per-host restricted_paths + env SSH_RESTRICTED_PATHS carve out zones inside path_allowlist where low-access and sftp-read tools refuse to operate (typical use: SMB-mounted shared data). Exec/sudo tools unaffected (don't go through path policy). New PathRestricted error with explicit "use ssh_exec_run/ssh_sudo_exec" pointer.
6. Hooks infrastructure — HookRegistry + HookEvent (STARTUP/SHUTDOWN/PRE_TOOL_CALL/POST_TOOL_CALL), bounded per-hook timeout, exception isolation, blocking vs non-blocking emit, load_external_hooks(registry, module_path) dotted-path loader. SSH_HOOKS_MODULE env points at an operator module exposing register_hooks(registry). Zero hooks registered by default. Side-effect only for now; blocking pre-hooks deferred.
- Suite: 267 pass, 2 skipped. 52 tools registered in 8 groups.
2026-04-15 INC-006 — critical bug found during Windows end-to-end verification: KnownHosts.fingerprint_for unpacked 3 values from asyncssh's 7-tuple match() return and caught ValueError, silently returning None for every lookup. Impact: ssh_host_ping never reported the pinned fingerprint; ssh_known_hosts_verify always reported expected_fingerprint=None; and the INC-007 fix (UnknownHost vs HostKeyMismatch disambiguation) silently degraded to "always UnknownHost" — a real host-key rotation or MITM would have been mislabeled. Fix: use tuple indexing + stop swallowing ValueError. Regression guard: test_fingerprint_for_resolves_real_entry generates a real ed25519 key via asyncssh.generate_private_key and round-trips it through the full match path. Suite: 178 pass, 2 skipped.
2026-04-15 Internal review pass (all 13 findings addressed, detailed status in INCIDENTS.md). Highlights: (INC-003) SSH_ENABLED_GROUPS added to Settings — without this, every server startup would crash. Regression guard test_config_has_every_field_lifespan_reads. (INC-004) assert re_canonical == canonical before rm -rf replaced with raise WriteError — survives python -O. (INC-005) streaming _pump updates byte counters nonlocally per chunk so stdout_truncated is accurate on timeout. (INC-007) UnknownHost vs HostKeyMismatch now disambiguated by known_hosts.fingerprint_for lookup, not exception message text. (INC-008) audit error field reduced to exception class name; full text at DEBUG only. (INC-009) SSH_SUDO_PASSWORD env-var rejected at startup (hard fail); operators must use SSH_SUDO_PASSWORD_CMD or OS keychain. (INC-010) non-UTF-8 files in ssh_edit/ssh_patch raise clean WriteError. (INC-011) streaming chunk_cb receives only captured bytes, matching buffer. (INC-012) absolute-path command_allowlist entries now require exact match; basename matching restricted to bare entries. (INC-013) port-range / type / absolute-path validation tests added. (INC-014) magic SFTP error codes replaced with asyncssh.sftp.FX_* constants. Suite: 177 pass, 2 skipped.
2026-04-15 Phase 4 sudo tier landed (per-call mode): ssh/sudo.py builds sudo -S -p '' -- sh -c/s -- wrappers with shlex-quoted commands and pipes password on stdin; fetch_sudo_password priority chain (SSH_SUDO_PASSWORD_CMD → keyring → SSH_SUDO_PASSWORD env → passwordless); startup WARNINGs for env-password and for unsupported persistent-su mode; two new tools ssh_sudo_exec (allowlist-checked) + ssh_sudo_run_script (stdin body) both tagged {dangerous, sudo, group:sudo} and @audited(tier="sudo"); 2 new per-tool skills (ASCII-guarded); README "Sudo" section added with recommended scoped NOPASSWD sudoers pattern. 24 tools total registered. Tests: 159 pass, 2 skipped. All 5 phases complete.
2026-04-14 External review (Findings: trust-before-verify, read-scope gap, host-policy ambiguity, empty-allowlist footgun, status/planned blur). Tightenings landed: (1) ADR-0017 — path confinement applies to every path-bearing read tool; ssh_sftp_list, ssh_sftp_stat, ssh_sftp_download, ssh_find now route paths through canonicalize_and_check. (2) ADR-0018 — empty command_allowlist now fails closed; new ALLOW_ANY_COMMAND env flag is the only way to permit arbitrary exec. (3) ADR-0019 — allow/block rules evaluate on canonical policy.hostname only; aliases are pure lookup keys. README rewritten: quickstart §2 and troubleshooting now scan-to-tempfile + verify fingerprint out-of-band before appending to known_hosts; Phase 3 marked shipped, Phase 4 sudo mentions watermarked (planned). Regression guard: tests/test_read_tool_path_confinement.py. Suite: 142 pass, 2 skipped.
2026-04-14 Tool skills authored for every tool (22 per-tool + 1 workflow). Each SKILL.md documents tier/group, inputs, returns, when/when-not, an example, common failures, and related tools. Skills are pure ASCII to work around an upstream FastMCP 3.2.4 bug where Path.read_text() is called without encoding= (Windows defaults to cp1252). Regression test: tests/test_skills_ascii.py fails any non-ASCII byte in any SKILL.md. All 23 skills load via SkillsDirectoryProvider. Suite: 134 pass, 2 skipped.
2026-04-14 Phase 3 exec tier landed: ssh/exec.py (run + run_streaming with timeout + pkill cleanup), services/exec_policy.py (command allowlist check), 3 tools (ssh_exec_run, ssh_exec_script, ssh_exec_run_streaming with TaskConfig(mode="optional")). All tagged {dangerous, group:exec}, audited, gated by ALLOW_DANGEROUS_TOOLS. Startup warns if docket backend is in-memory (ADR-0011). 22 tools registered total. Tests: 110 pass, 2 skipped.
2026-04-14 SSH agent integration verified end-to-end against live Pageant on Windows. Fixed two real bugs discovered only by running it: (1) SSHAgentClient(None) is wrong — asyncssh needs "" for auto-detect; _resolve_socket now returns "" on Windows when SSH_AUTH_SOCK is unset. (2) Agent keys are SSHAgentKeyPair, not SSHKey, and lack get_fingerprint() — fingerprints now computed from public_data via SHA-256. pywin32 added as a Windows-only dependency for asyncssh's Pageant backend. New live smoke test (test_live_agent_returns_well_formed_fingerprints) runs against the operator's real agent when one is present. Tests: 95 pass, 2 skipped.

Phase 0 — skeleton ✅

pyproject.toml with PEP 621 metadata + hatchling build backend, console script, dev/tasks/telemetry optional-dependencies
fastmcp.json pointing at src/ssh_mcp/server.py:mcp_server
.env.example, .gitignore
src/ssh_mcp/ layout: __init__, __main__, run_server, server, lifespan, config
Subpackage stubs: ssh/, services/, models/, tools/
ConnectionPool no-op stub
models/results.py — ExecResult, StatResult, WriteResult
Tier gating wired in lifespan via Visibility(False, tags={...})
Smoke test: imports, server constructs, default-deny config

Phase 1 — read-only tier

1a. Host configuration ✅

1b. SSH transport ✅

ssh/errors.py — UnknownHost, HostKeyMismatch, HostNotAllowed, AuthenticationFailed, AgentFingerprintNotFound, ConnectError, CommandTimeout, PathNotAllowed
ssh/known_hosts.py — loader; missing file → empty (warn); fingerprint_for(host, port) helper
ssh/agent.py — list_agent_fingerprints(), select_agent_key(agent_path, fingerprint) via asyncssh.SSHAgentClient
ssh/connection.py — open_connection() honoring HostPolicy.auth (agent / key / password); passphrase_cmd and password_cmd via subprocess
ssh/pool.py — keyed _Entry pool with per-key asyncio.Lock, proactive 60 s idle reaper, stats() + close_all()
ProxyJump / bastion chaining recursive through the pool (asyncssh.tunnel=)
app.py split from server.py to break the circular import between tools and the FastMCP instance
Startup fail-fast: agent reachable + fingerprint present when identity_fingerprint set (deferred to Phase 1b+ alongside real-host integration)

1c. Read-only tools ✅

Phase 2 — low-access tier ✅

Phase 3 — exec tier ✅

Phase 4 — sudo tier ✅ (per-call only; persistent-su deferred)

Phase 5 — polish ✅

telemetry.py — span() wrapper degrading to a noop when OTel is not wired + redact_argv() for --password=* / --token=* / --secret=* / --api-key=* (length preserved)
services/audit.py — record() emits one JSON line to the ssh_mcp.audit logger (path + command SHA-256-hashed); @audited(tier=...) decorator; applied to all 8 low-access tools
FastMCP Skills provider mounted (_mount_skills in lifespan) when SSH_SKILLS_DIR exists; seed runbook at skills/ssh-incident-response/SKILL.md
Tool groups — SSH_ENABLED_GROUPS wired to per-group Visibility(False, tags={"group:<name>"}) in lifespan. Empty = all groups enabled (ADR-0016). Unknown groups logged and ignored.
README — setup, env vars, hosts.toml schema, tier flags, tool groups, per-host identity patterns, key rotation runbook
Config knob: SSH_SKILLS_DIR (default skills/) for optional skills directory
Evaluate BM25SearchTransform — landed 2026-04-15 behind SSH_ENABLE_BM25 (default OFF). 52 tools is past the 30 threshold; opt-in keeps the small-deployment ergonomics intact.
Wire telemetry.span() into ssh/connection.py, ssh/exec.py, services/path_policy.py — landed 2026-04-17. ssh.connect, ssh.exec (buffered + streaming variants), and path.canonicalize spans attach host / port / exit code / duration without ever recording argv, path content, or auth secrets (redaction posture per telemetry.py module docstring). Wiring locked in by tests/test_telemetry.py::test_*_opens_*_span.
Wire @audited into exec + sudo tools (Phase 3+4) — applied via @audited(tier="dangerous"/"sudo") in tools/exec_tools.py and tools/sudo_tools.py

Known upstream issues

FastMCP 3.2.4 skills loader — fastmcp.server.providers.skills.skill_provider calls Path.read_text() in 5 places without encoding="utf-8". On Windows this defaults to cp1252 and fails on any non-ASCII byte. Workaround: pure-ASCII SKILL.md files (enforced by tests/test_skills_ascii.py). Revisit when FastMCP adds explicit encoding.

Ongoing / cross-cutting

Host blocklist (SSH_HOSTS_BLOCKLIST) with deny-wins precedence, centralized in services/host_policy.py
CSV parsing for list-valued env vars (SSH_HOSTS_ALLOWLIST, SSH_HOSTS_BLOCKLIST, SSH_PATH_ALLOWLIST, SSH_COMMAND_ALLOWLIST)
CI: ruff + mypy + pytest on push
Version bump on any tool-signature change (strict semver MAJOR.MINOR.PATCH)
Redact --password=* / --token=* in telemetry and audit — landed 2026-04-17. Telemetry side: span() already attaches no argv (redaction by exclusion). Audit side: new redact_command_string() helper for raw-string commands, mirrored to redact_argv() for list-form. @audited now extracts command: str (ssh_exec_run / ssh_exec_run_streaming / ssh_sudo_exec / ssh_docker_exec) and args: list[str] (ssh_docker_run) into the audit line; script: bodies (ssh_exec_script) are deliberately NOT captured per the tool's stdin-only contract. record() redacts BEFORE hashing AND strips the :N length suffix so two --password=X calls with different X produce the same command_hash (dedup-by-shape) instead of leaking the secret value via stable-hash rainbow lookup.
Lint rule or review check for shell=True, f-string commands, os.system
Glob / pattern matching for allowlist + blocklist (DESIGN.md §11 Q4 — deferred)
Per-host blocked = true flag in hosts.toml (currently env-only; revisit if operators ask)

Deferred (see DECISIONS.md)

Hooks system (pre/post-connect, pre/post-command)
Workflow tools (backup/restore, db_dump, deploy) — exposed via Skills instead
Port/X11 forwarding, tunneling
Windows target hosts

2026-04-30 — Sprint 5 (v1.6.0)

5a — ssh_host_alerts typed result — HostAlertsResult + AlertBreach Pydantic models replace the previous dict[str, Any] return. extra="forbid" on both models; LLM gets schema validation. Aligns with ADR-0025 / INC-046 consistency. (id: sprint5-alerts-typed)
5b — Output sanitizer reach extension — output_warnings: list[str] added to HostInfoResult and UserInfoResult. ssh_host_info scans uname, uptime, and each os_release value; ssh_user_info scans gecos (attacker-controllable via chfn on shared boxes). Strings unchanged; warnings only (INC-058 pattern). (id: sprint5-sanitizer-reach)
5c — Agent-notes hygiene docs — skills/ssh-host-notes-append/SKILL.md extended with "What is SAFE vs UNSAFE to write" section addressing the self-reinforcing-channel risk from INC-060 ping auto-injection. (id: sprint5-notes-hygiene)
5d — ssh_link internal refactor — 152-line tool split into 3 mode helpers (_create_symbolic_link, _create_hard_link_followed, _create_hard_link_unfollowed). Tool body now dispatch + WriteResult assembly only. Behavior 1:1. Largest cosmetic change of the sprint. (id: sprint5-link-refactor)
5e — session_tools.py merged into shell_tools.py — ssh_session_list moved next to ssh_shell_list (semantic siblings). session_tools.py deleted. server.py, test_audited_coverage.py, e2e test, and DESIGN.md file-tree updated. Closes the deferred Sprint 4 step. (id: sprint5-session-merge)
14 new tests across all five items.

2026-04-30

Sprint 4 — SOC refactor: host_tools.py split (v1.5.0) — tools/host_tools.py trimmed 909→695 lines; new services/host_notes.py (public API: HOST_NOTES_ALIAS_RE, either_notes_present, resolve_sidecar_path, read_sidecar, atomic_write_sidecar) + new tools/host_notes_tools.py (3 notes tools: ssh_host_notes, ssh_host_notes_append, ssh_host_notes_set). No tool surface change. server.py registers host_notes_tools. Addresses code review M6+M7. Step 6 (session_tools merge) deferred due to e2e import coupling. (id: soc-refactor-sprint4)
Sprint 2 — Dead-code purge + OTEL_ENABLED wire-up (v1.3.0) — Removed 9 confirmed-dead items: src/ssh_mcp/ssh/argv.py (full module), CommandTimeout exception class, _partial_on_timeout helper, _DOCKER_VOLUME_FLAGS constant + re-export, SessionRegistry.reap_idle method, SSH_ALLOW_KNOWN_HOSTS_WRITE setting, stale "removed" comment block in sftp_read_tools.py, and ssh_session_stats tool (tests + e2e + SKILL folder deleted). Wired OTEL_ENABLED setting to gate telemetry._get_tracer (was declared but never read; now returns None when false, suppressing all span emissions). Shell-session lifecycle clarified as caller-owned via ssh_shell_open / ssh_shell_close — no idle reaper exists or is intended. Addresses reviewer findings M1-M5, L1, L3, H2, H3. (id: dead-code-purge-sprint2)

2026-05-03

Sprint 1 — Compose-file path-policy tightening (v1.2.0) — Migrated all 5 ssh_docker_compose_* call sites in tools/docker/lifecycle_tools.py from canonicalize_and_check to resolve_path (which bundles canonicalize + allowlist + restricted-zones). Compose files in restricted_paths zones now raise PathRestricted instead of silently executing. Closes INC-061 (code-review 2026-04-30). 5 new parametrized unit tests in tests/test_compose_path_policy.py. Stale canonicalize_and_check carve-out reference removed from DESIGN.md §5.6 and TOOLS.md Docker lifecycle section. All 8 compose SKILL.md files updated with restricted_paths constraint and PathRestricted failure entry. (id: compose-path-policy-sprint1)

2026-04-27

Thread ResolvedHost through systemctl wrapper helpers — T1 (arjancodes sprint) preserved the double-resolve pattern at systemctl_tools.py lines 424, 452, 478, 505, 562, 596, 620, 686 to keep scope tight. Each site calls resolve_host(ctx, host).policy then passes HostPolicy into a helper that internally calls resolve_host again. The same redundancy exists in ssh_cp / ssh_mv / ssh_docker_exec. Mechanical cleanup: thread the ResolvedHost returned by the outer resolve_host call through to each helper, removing the inner re-resolution. Low priority; no behavior change. (id: resolved-host-thread-systemctl)
Adopt SshTransport Protocol at caller signatures — T2 (arjancodes sprint, pending merge from worktree agent-a51fd28feb5f56992) added src/ssh_mcp/ssh/protocols.py with the SshTransport(Protocol) interface covering run, start_sftp_client, close, and wait_closed. No callers were migrated (intentionally — T2 was type-declaration only). Future sprint: update pool.py / exec.py / connection.py parameter signatures from asyncssh.SSHClientConnection to SshTransport so the paramiko-fallback hook described in AGENTS.md §6.5 has a concrete type to plug into. (id: ssh-transport-protocol-callers)
Per-host group overrides (hosts.<name>.groups = [...]) — revisit if operators ask

2026-04-30 — Sprint 6 (v1.7.0)

APT package tools — new pkg group — 3 new read-tier tools: ssh_apt_list (apt list with installed/upgradable/all modes + glob filter), ssh_apt_search (apt-cache search by name + description), ssh_apt_show (combined apt-cache show + policy for one package). New files: models/apt.py, services/apt_parser.py, tools/apt_tools.py. POSIX-only; non-Debian hosts receive clean PlatformNotSupported via apt-binary probe. Pattern/package argv-validated. All three carry @audited(tier="read"), tags {safe, read, group:pkg}. 3 new SKILL.md files under skills/ssh-apt-list/, skills/ssh-apt-search/, skills/ssh-apt-show/. 87 new tests; 968 unit tests passing total. (id: sprint6-apt-pkg-group)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Backlog

Progress

2026-06-01

2026-05-30

2026-05-28

2026-05-22

2026-05-08

Phase 0 — skeleton ✅

Phase 1 — read-only tier

1a. Host configuration ✅

1b. SSH transport ✅

1c. Read-only tools ✅

Phase 2 — low-access tier ✅

Phase 3 — exec tier ✅

Phase 4 — sudo tier ✅ (per-call only; persistent-su deferred)

Phase 5 — polish ✅

Known upstream issues

Ongoing / cross-cutting

Deferred (see DECISIONS.md)

2026-04-30 — Sprint 5 (v1.6.0)

2026-04-30

2026-05-03

2026-04-27

2026-04-30 — Sprint 6 (v1.7.0)

Uh oh!

FilesExpand file tree

BACKLOG.md

Latest commit

History

BACKLOG.md

File metadata and controls

Backlog

Progress

2026-06-01

2026-05-30

2026-05-28

2026-05-22

2026-05-08

Phase 0 — skeleton ✅

Phase 1 — read-only tier

1a. Host configuration ✅

1b. SSH transport ✅

1c. Read-only tools ✅

Phase 2 — low-access tier ✅

Phase 3 — exec tier ✅

Phase 4 — sudo tier ✅ (per-call only; persistent-su deferred)

Phase 5 — polish ✅

Known upstream issues

Ongoing / cross-cutting

Deferred (see DECISIONS.md)

2026-04-30 — Sprint 5 (v1.6.0)

2026-04-30

2026-05-03

2026-04-27

2026-04-30 — Sprint 6 (v1.7.0)