feat(contexto-py): LocalBackend for on-disk mindmap retrieval#153
Open
sm86 wants to merge 8 commits into
Open
Conversation
Implements the LocalBackend per the approved 2026-05-23 design:
embeddings + summarization call the user's OpenAI/OpenRouter provider, and
state lives on disk as a single JSON file. Selectable via CONTEXTO_BACKEND=local;
remote stays the default.
- 10 modules under src/contexto_hermes/local/ (mindmap_types, extractor,
store, labeler, embedder, summarizer, clustering, retrieval, backend, __init__)
- scipy AGNES (linkage average/cosine) + beam-search retrieval
- Atomic JSON writes with corrupt-file quarantine + version:1 schema
- Backend-aware register() reading CONTEXTO_BACKEND
- ContextoEngine.from_env_local() + ContextoConfig.local_mode_defaults()
- SearchResult.paths typed list[list[str]] (cluster labels)
- Search items wrapped as {item, score} mirroring TS ScoredQueryResult
- LocalBackendConfig.embed_model/llm_model nullable with provider-default fallback
- config_snapshot populated on every save (embed_model, mindmap tunables)
- Store quarantines non-dict stats explicitly
- 124 tests under tests/local/ (336 total in contexto-py, all green)
- Docker E2E harness (e2e/Dockerfile + run_local_e2e.py) verified against real OpenRouter
- pyproject.toml bumped 0.1.0 -> 0.2.0; adds numpy + scipy
- Spec §7 storage path resolved to honor \$HERMES_HOME
- e2e/run_local_e2e.py: SearchResult.items are {"item": ..., "score": ...}
wrappers per spec §5. The script read item["content"] directly, which
KeyError'd after a successful real-provider search. Now unwraps and also
logs the score so the rank ordering is visible.
- e2e/docker-compose.hermes-local.yml: the hermes-agent repo's
plugins/context_engine/contexto symlink works for the local hermes CLI but
not inside a Docker image — docker build's COPY resolves the absolute
symlink to a host path the container can't see, leaving a broken link at
the plugin slot. Make the bind mount of src/contexto_hermes onto the plugin
path active and read-only.
- e2e/README.md: replace the misleading "no bind-mount needed" note with the
actual constraint, plus an alternative (copy the plugin tree into the image
before docker build) for users who don't want a bind mount.
…README - clustering.py: Clusterer reset its counter to 1 on every construction. After a process restart, an incremental insert against a loaded tree produced a fresh `cluster-1` under root alongside the persisted `cluster-1`. Replace the itertools.count with a plain `_next_id` int and fast-forward it past the largest `cluster-N` already in the tree on every add() that has a root. Idempotent across multiple adds. - tests/local/test_clustering.py: new TestClusterIdUniquenessAfterReload reproduces the reviewer's scenario (100 items, rebuild_interval=1000, fresh Clusterer against loaded state) and asserts unique ids across reload + multiple incremental adds. - README.md: was still remote-only — said "Only CONTEXTO_API_KEY is required" and described the remote env vars exclusively. Document both backends, the CONTEXTO_BACKEND selector, the local provider/key resolution rules, and the full CONTEXTO_LOCAL_* table so users installing 0.2.0 can enable the new backend from the package README alone.
…unch
The hermes-agent base image doesn't ship numpy or scipy. Bind-mounting the
plugin source is necessary but not sufficient: on first agent request the
local backend fails to construct ("No module named 'numpy'") and the gateway
silently falls back to its built-in compressor.
Wrap the gateway command in `sh -c` so `uv pip install` runs against the
image's venv (already activated by the entrypoint) before exec'ing
`gateway run`. The install is idempotent — fast no-op on restarts.
Document both image gaps (plugin source + runtime deps) in the e2e README.
Adds docs/contexto-hermes-quickstart.md — a focused install guide for both backends. Covers prereqs, the shared pip+symlink step, the per-backend env vars, a verify step (log grep + jq on the mindmap), the two Docker gotchas (symlinked plugin source + missing numpy/scipy in the image), and a troubleshooting table for the common failure modes. Linked from docs/SUMMARY.md. Package README points to the new doc for the copy-paste walkthrough. Also normalizes the install command in the README to `contexto-hermes-install` (the script entry point shipped in pyproject.toml) instead of `python -m contexto_hermes.install` — both work, but the script form is shorter and is what the quickstart uses.
The hermes-agent entrypoint wraps a bare `gateway run` as `hermes gateway run` only when `gateway` is the first container arg (its `command -v "$1"` check fails, so it falls through to `exec hermes "$@"`). Inside our `sh -c` wrapper the first arg is `sh`, which resolves on PATH — the entrypoint exec's `sh` directly and never reaches the `hermes` fallback. `gateway` isn't a standalone binary, so the container died with `sh: exec: gateway: not found`. Fix: call `hermes gateway run` explicitly inside the `sh -c`. `hermes` is on PATH because the entrypoint activates /opt/hermes/.venv before exec. Same fix applied to the Docker snippet in docs/contexto-hermes-quickstart.md.
…s to 0.1.0
Top-level README:
- Quick Start now has two subsections: OpenClaw (existing) and Hermes (new —
pip install + contexto-hermes-install + config.yaml + key + run). Points to
docs/contexto-hermes-quickstart.md for the fully local setup.
- Hero copy, "Why Contexto", "What You Get", "Who Should Use This", and the
Quick Start lead now mention both runtimes instead of OpenClaw only.
- Roadmap no longer lists "Local backend" — shipped in this release.
Version renumber 0.2.0 -> 0.1.0:
- contexto-hermes was never on PyPI as 0.1.0; the bump to 0.2.0 was internal
noise. Renumber so the first public release lands as 0.1.0.
- pyproject.toml, __init__.py, plugin.yaml, types.py docstring, README,
test_plugin_yaml.py comment, and the v0.2.0+ marker in the quickstart all
updated.
Verification: 338 passed, 2 skipped. `python -m build` produces clean
contexto_hermes-0.1.0.{tar.gz,whl}; `twine check` passes both.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
LocalBackendforcontexto-hermes: embeddings + summarization call the user's OpenAI/OpenRouter provider; state lives on disk as a single JSON file; selectable viaCONTEXTO_BACKEND=local(remote stays the default).src/contexto_hermes/local/(extractor, store, labeler, embedder, summarizer, scipy AGNES clustering, beam-search retrieval, backend orchestrator); 124 new tests undertests/local/covering provider/key matrix, error contract, round-trip, store quarantine, and beam-search shape; 336 total incontexto-py, all green.pyproject.tomlbumped0.1.0 → 0.2.0and addsnumpy+scipy;plugin.yamldeclaresCONTEXTO_LOCAL_*env vars; storage path honors$HERMES_HOME(resolves to~/.hermes/data/contexto/mindmap.jsonlocally,/opt/data/data/contexto/mindmap.jsonin the Hermes container); search items wrapped as{item, score}for TSScoredQueryResultparity;config_snapshotpopulated on every save;embed_model/llm_modelnullable with provider-default fallback.Test plan
pytest tests/— 336 passed, 2 skipped (existing remote smoke tests, gated onCONTEXTO_API_KEY)tests/local/test_config.pyCONTEXTO_BACKEND=localwith/without key, invalid value,remoteunchanged (tests/local/test_register.py)tests/local/test_round_trip.py)searchreturnsNonebefore retrieval is calledtests/local/test_backend.pye2e/Dockerfile+e2e/run_local_e2e.py) — verified against real OpenRouter; Kubernetes-shaped episodes outrank an unrelated restaurant oneCONTEXTO_BACKEND=localend-to-end (recipe ine2e/docker-compose.hermes-local.yml; needs an OpenRouter or OpenAI key)