feat(agent): knowledge manager server setup with local daemon, XState pipeline, and Mastra agent loop by horacioh · Pull Request #588 · seed-hypermedia/seed

horacioh · 2026-05-11T14:09:55Z

Summary

Knowledge Manager skill (seed-knowledge-manager/SKILL.md) — LAFH/GC-Red methodology implementation for Seed communities: synthesis docs, periodic bulletins, gap detection, onboarding, expertise maps, and health reports
Agent server (seed-knowledge-manager/agent/) — autonomous Moderador de Redes running on oc.hyper.media with full systemd setup, operator Telegram bot, and audit logging
CLI site commands (frontend/apps/cli/src/commands/site.ts) — subscribe, unsubscribe, list-subscriptions, sync-status, and reconcile commands for managing local daemon subscriptions
API schema (frontend/packages/client/src/hm-types.ts) — Zod schemas and types for Subscribe, Unsubscribe, ListSubscriptions, and ForceSync RPCs
API implementations (frontend/packages/shared/src/api-subscriptions.ts, api-force-sync.ts) — wire these RPCs through the Remix /api/<RPC> surface
Workstream A (KM_USE_LOCAL_DAEMON) — preflight sync-status gate; km-reconcile.timer for periodic ForceSync; scheduler hot-tier promotion (--syncing.subscription-hot-tier) so capability blobs converge faster
Workstream B (KM_USE_STATE_MACHINE) — XState v5 per-mention actor with retry/backoff and JSONL crash-resume replacing the ad-hoc two-pass poll loop
Workstream C (KM_USE_MASTRA_AGENT) — bounded DeepSeek tool-call loop (≤10 calls + forced final_answer) for dynamic context expansion; multi-turn Telegram history
SQLite busy_timeout (backend/storage/sqlite.go) — 5s timeout to prevent SQLITE_BUSY on headless agent VMs
LAFH research doc (docs/research-lafh-knowledge-management.md) — full theoretical grounding for the GC-Red methodology

Breaking Changes

None. All three workstreams are behind feature flags (KM_USE_LOCAL_DAEMON, KM_USE_STATE_MACHINE, KM_USE_MASTRA_AGENT) defaulting off; existing paths unchanged.

…th agent scaffolding Add a Seed Knowledge Manager skill implementing Luis Ángel Fernández Hermana's network knowledge management methodology, including agent infrastructure templates, governance document templates, research deep-dive document, and deployment scaffolding across 6 agent directories.

…pose and systemd unit

Introduce a drop-in `secret-tool` shim so seed-cli can store/lookup secrets via a JSON file in headless environments where the OS keyring is unavailable. Add a `seed-web` service to the Docker Compose stack so the agent's seed-cli talks to the Remix frontend (port 3000) instead of directly to the daemon's raw gRPC-Web surface. Wire both services onto a shared `seed-net` bridge network and document the updated architecture.

Add the Bun-built TypeScript wrapper that fronts seed-cli for the Knowledge Manager agent. Provides: - governance loader (parses YAML rules out of fixed-path Seed docs, 60s TTL cache) - path allow/deny matcher with hardcoded denylist over the four governance docs themselves - per-day rate counters (per-run counter intentionally not persisted) - mention parser for Seed comment annotations + reply-target builder - audit-log per process invocation (meta.json + trace.jsonl + llm.jsonl + tools.jsonl + seed-cli.jsonl, with secret redaction) - typed seed-cli wrapper with verb-pair denylist (key:generate, capability:create, …) - stdio MCP server entry exposing the read/write/state tools to a potential nanobot host 44 unit tests under bun:test cover the matcher, governance parsing, mention detection, redaction, state, and the seed-cli denylist. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

HKUDS/nanobot config that wires the MCP wrapper as the agent's only tool surface. DeepSeek as the LLM, restrictToWorkspace + bwrap sandbox on the built-in shell/files/web tools, custom MCP server registered as "seed". Pinned to port 18791 (default 18790 was already taken by another container on this multi-tenant host). Telegram channel intentionally disabled here — Phase 7 ships a dedicated Telegram driver instead. System prompt teaches DeepSeek the LAFH-driven /poll-mentions verb, enforces hard rules from the rules doc, and disables file/exec/web skills via disabledSkills. NOTE: nanobot ended up not being on the critical path — the polling loop was reimplemented as a deterministic Bun driver in Phase 5 because nanobot's tool-result-spilled-to-disk pattern made DeepSeek loop on read_file/grep instead of replying. The gateway service is kept around as an idle, optional REPL surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…6.5) Bun-built standalone driver that bypasses nanobot for the polling loop. Two-pass design: Pass A — placeholders (deterministic, ~1-2s per mention) Walk the activity feed, fetch each candidate comment, look for Embed annotations linking to the agent's account or the site root, filter by writer-capability holders, post a "Working on this — back in a moment. ⌛" comment as a threaded reply (with fallback to top-level on the seed-cli `--reply` Non-base58btc bug). Persist the placeholder mapping in km-state/placeholders.jsonl so a crash between passes is recoverable. Pass B — finalisation (one DeepSeek call per pending placeholder) Site-search for relevant docs, inject as context, call DeepSeek to draft the reply, edit the placeholder via `seed-cli comment edit`. Cite hm:// URLs inline. Fall back to a fixed message on DeepSeek failure so a placeholder is never stranded on "Working…". Key seed-cli quirks worked around: - `comment create` writes the success message ("✓ Comment published: <CID>") to stderr, not stdout; the value is the version CID, not the record id. We parse stderr, then `comment get <CID>` to read back the canonical record id for later editing. - Activity feed `--resource` is exact-match. Filtering by site root hides comments on /discussions/* etc. We pull unfiltered events and post-filter by `comment.targetAccount`. - Cursor model is reverse-chronological pagination, not "since last poll". State stores the newest event id we've classified. systemd timer at 15s cadence; unit timeout 180s. km-log helper + logrotate user config also land here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Single Bun driver `cadence-cli.ts` selected by KM_TASK env var: - boletin (Mon 09:00 UTC, 7-day window) Weekly bulletin published at /agents/knowledge-manager/state/boletin/<YYYY-Www> - gap (Wed 10:00 UTC, 7-day window) Gap report at /agents/knowledge-manager/state/gaps/<YYYY-MM-DD> - health (1st of month 09:00 UTC, 30-day window) Network health at /agents/knowledge-manager/state/network-health/<YYYY-MM> Pattern per task: load governance → respect draft_only kill-switch + allow/deny path rules → collect activity snapshot → one DeepSeek call with the LAFH template skeleton and the snapshot as context → publish via `seed-cli document create --force`. Activity snapshot tolerates the daemon's mixed ID serialisation (string vs {id, uid, path[]}). Fixed `/` in `allow_write_paths` to mean "everything below root" instead of just the root literal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Long-running Bun driver that long-polls the Telegram REST API for the operator chat(s) configured in OPS_TELEGRAM_ID (allowFrom matched against either the sender's user-id or the chat-id, so DMs and groups both work). Three modes: - slash commands /status /last-runs /show-rules /poll-now /help - /ask <q> operator-mode Q&A grounded in: - README excerpt (~3.5KB) - last 8 audit run summaries - current governance rules JSON no site search; different system prompt biased to cite filenames + systemd units explicitly - plain text community-mode Q&A grounded in seed-cli search results — same pipeline as comment-mention replies Both Q&A modes preserve a per-chat conversation history (last 10 turns, JSONL under km-state/telegram-history/<chatId>.jsonl, auto-rotated above 256KB). DeepSeek logic factored into reply-engine.ts (shared with poll-cli): - callDeepSeek(messages, opts) - gatherSiteContext(cli, question, site) - draftReply(question, siteContext, audit, history) - draftSystemReply(question, systemContext, audit, history) Each Telegram exchange writes a complete audit run under ~/km-logs/runs/<…>__telegram-question or telegram-ask__<ulid>/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Rewrite the agent README as the canonical operator runbook: - as-built ASCII architecture diagram - governance docs (the four Seed-side policy files) + kill-switch - operator quick-reference (one-liners for log browsing, manual triggers, gateway restart) - end-to-end bootstrap from scratch (10 numbered steps), each with the divergences we discovered along the way called out as inline NOTEs (Node 22 NodeSource, daemon -keystore-dir, secret-tool shim, seed-web container config.json, agent profile via Vault, port 18791 conflict, etc) - Telegram bot (/help /status /last-runs /show-rules /poll-now /ask <q> + plain-text community Q&A) with security note - 22-row verification matrix marking everything verified live on production - known issues + workarounds (seed-cli --reply Non-base58btc, agent home-doc creation 500, comment-create stderr quirk, activity --resource exact-match, cursor model, nanobot vs deterministic driver tradeoffs) - complete repo layout reference Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ith thread + linked docs Replace the cursor-based activity polling with a time-window scan using `processed.jsonl`/`placeholders.jsonl` for idempotency, fixing an eventually-consistency bug where new comments were missed. Extend `gatherCommentReplyContext` to include parent document body, full comment thread (walking replyParent chain), 1-hop linked documents/profiles, and site search results. Reduce systemd timer cadence from 15s to 30s.

…d SEED-KM agent machinery

…d mention key fix - Extract shared Seed Hypermedia markdown rules into seed-primer.ts and inject into community agent and cadence prompts so LLM never emits bare hm:// URLs or malformed list blocks - Auto-create missing parent index docs before cadence writes leaf docs so desktop navigator can drill into /agents/knowledge-manager/state/<kind> - Fix MentionSupervisor actor map key mismatch on replay: derive id from mentionKey(initialMention) instead of filename; sanitize fs-unfriendly chars (/ \ :) in persisted JSONL filenames via sanitizeForFs() - Raise MAX_COMMENT_FETCHES 60→200 to handle busier threads

…aliases Reduce MAX_TOOL_CALLS from 30 to 10 and introduce FORCE_FINAL_THRESHOLD so the model is forced to emit final_answer before budget exhaustion. DeepSeek bursts 2-3 tool calls per step, so the old step-indexed gate fired too late. Add inline-content salvage fallback: when the model exhausts its budget without calling final_answer, recover the last substantive assistant message rather than returning nothing. Tighten prompts to match the reduced budget (≤8 tool calls, single search, ≤2 doc fetches). Gate the WRITER/allowlist invoker check behind KM_ENFORCE_INVOKER_GATE env var (off by default) so the agent answers any commenter. When gate is active, resolve comment authors to their principal via local daemon then gateway fallback to handle aliased device-key accounts. Fix idempotency bug: check isProcessed/hasPlaceholderFor before the not-allowed classification to prevent duplicate entries in processed.jsonl for unprivileged authors who keep mentioning the agent.

…ntion Add second trigger path in poll-cli so KM auto-responds when a comment is a reply inside a thread KM is already participating in, enabling multi-turn dialogue without forcing re-mention every turn. - Add `detectThreadReplyToKm` in mentions.ts: walks replyParent chain up to 30 hops, returns first KM-authored ancestor - Add `buildThreadReplyMention`: builds Mention from full comment body with `triggerSource: 'thread-reply'` discriminator - Add `MentionTriggerSource` type; tag existing mention path as `'mention'` - Wire per-cycle `replyChainCache` in poll-cli to avoid redundant CLI calls for sibling replies on the same thread - Emit `mention_via_thread_reply` audit event with ancestorCommentId - Update system prompt to skip re-introduction on follow-up turns - Add thread-reply.test.ts: direct parent, transitive ancestor, negative cases, cache behavior, buildThreadReplyMention shape, cycle isolation

…chains Defer thread-reply mentions to a direct-reply pass (no placeholder→edit flow) to avoid "Non-base58btc character" errors from seed-cli passing a RecordID where a CID is expected when the reply parent is itself a threaded reply. Root cause documented in .ai/seed-cli-reply-chain-fix.md; this commit applies the client-side workaround until seed-cli is patched upstream.

horacioh force-pushed the knowledge-agent-server-setup branch from 5fa3f71 to 7d4d611 Compare May 12, 2026 08:18

horacioh and others added 16 commits May 14, 2026 15:30

feat(agent): add Phase 1 server bootstrap with seed-daemon Docker com…

50a5cf9

…pose and systemd unit

feat: add site subscription commands, agent subscription-hot-tier, an…

cab3628

…d SEED-KM agent machinery

fmt

ce7b05d

horacioh force-pushed the knowledge-agent-server-setup branch from 4773979 to e323e63 Compare May 14, 2026 13:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agent): knowledge manager server setup with local daemon, XState pipeline, and Mastra agent loop#588

feat(agent): knowledge manager server setup with local daemon, XState pipeline, and Mastra agent loop#588
horacioh wants to merge 16 commits into
mainfrom
knowledge-agent-server-setup

horacioh commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

horacioh commented May 11, 2026

Summary

Breaking Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant