Skip to content

Add update-detection watchdog (no auto-apply)#7

Merged
lai3d merged 1 commit into
mainfrom
claude/self-update-watchdog
May 18, 2026
Merged

Add update-detection watchdog (no auto-apply)#7
lai3d merged 1 commit into
mainfrom
claude/self-update-watchdog

Conversation

@lai3d
Copy link
Copy Markdown
Owner

@lai3d lai3d commented May 18, 2026

Summary

Adds an opt-in watchdog that periodically checks a version-manifest URL for an available sigma-agent update and surfaces the signal via Prometheus + MCP. Detection only — never downloads, restarts, or applies anything. Operators act consciously.

What

New module sigma-agent/src/watchdog.rs:

  • watchdog_loop() GETs the configured manifest URL (5s timeout) every interval, parses { "version": "...", "binary_url": "...", "sha256": "..." }
  • compare_versions() does component-wise numeric comparison on x.y.z strings (no semver dep)
  • check_once() exposed for on-demand MCP calls (force=true)
  • Result stored in Arc<RwLock<UpdateInfo>> snapshot

Wire-up:

  • --update-check / AGENT_UPDATE_CHECK (default false)
  • --update-manifest-url / AGENT_UPDATE_MANIFEST_URL (default https://lai3d.github.io/sigma/agent-version.json)
  • --update-check-interval / AGENT_UPDATE_CHECK_INTERVAL (default 3600s)
  • Prometheus: sigma_agent_update_available{current_version, latest_version} (1/0), sigma_agent_update_last_check_timestamp
  • MCP: new tool agent_check_update with optional force: bool (immediate check vs cached snapshot)

Design — detection only, deliberately

  • Never auto-downloads, never auto-restarts, never auto-applies. Auto-update on a fleet of VPS instances is a recipe for outages — a bad release that boots fine but breaks under load takes the whole fleet at once. This watchdog surfaces a signal; the operator decides when and how to roll out.
  • Graceful degradation. Network errors, 404s, malformed JSON all log a warning and set update_available: false with last_error populated. The agent's regular operation is never affected.
  • Bounded cost. One HTTP call per hour (default), one Arc snapshot in memory — well within the agent's <1% CPU / <50 MiB RSS budget.

Test plan

  • cargo check --no-default-features — clean
  • cargo test --no-default-features — 33/33 pass (8 new compare_versions tests covering the spec'd cases plus minor/major bump + garbage input)
  • README.md updated with new ## Self-Update Watchdog section (manifest schema, flags, metrics, MCP tool, detection-only contract)
  • Live test with a real manifest URL (separate step)

Part of the agent roadmap.

Introduces an opt-in self-update watchdog (--update-check) that polls a
version-manifest JSON URL on a configurable interval (default 1h, 5s HTTP
timeout) and exposes the result as Prometheus gauges and an MCP tool.

Detection-only by design: never downloads, restarts, or applies anything.
Operators are notified via sigma_agent_update_available +
sigma_agent_update_last_check_timestamp and the agent_check_update MCP
tool, and consciously act on the signal. Network errors, non-2xx
responses, and malformed JSON degrade gracefully — last_error is set,
the gauge falls to 0 (unknown), the watchdog never panics or blocks the
heartbeat loop.

Version compare is component-wise u32 on '.' (no semver crate); any
parse failure or component-count mismatch returns false to avoid
ambiguous "update" claims.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lai3d lai3d force-pushed the claude/self-update-watchdog branch from 1c05b00 to ae561a3 Compare May 18, 2026 17:49
@lai3d lai3d merged commit 3544cc1 into main May 18, 2026
1 check passed
@lai3d lai3d deleted the claude/self-update-watchdog branch May 18, 2026 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant