Genome-based context compression for local LLMs, with a machine-tagged know/miss agent contract.
Coordinate-index engine for LLM agents. Helix retrieves, weighs, and compresses your codebase into a context window — without a single LLM call on the retrieval path.
- Compression: collapses ~9k tokens of raw working set into a ~600 effective-token assembled context (28.7× headline on production workloads, 5.4× median across 15 query shapes).
- Agent contract: every
/contextresponse carries a top-levelknow{}(grounded, you may answer) ormiss{}(do_not_answer_from_genome:true, plusescalate_totools orrefresh_targetspaths). Staleknowblocks downgrade tomiss(reason="stale"|"cold"|"superseded")via the Stage 7 freshness gate. - LLM-free retrieval:
/contextruns spaCy NER, SQLite FTS5 BM25, BGE-M3 dense recall, RRF fusion, Howard-2005 TCM, and Hebbian co-activation — zero ribosome calls. The only LLM call is downstream at/v1/chat/completions. - Three install paths: compact tray flow (
start-helix-tray.bat) for the daily driver, proxy-only forOPENAI_BASE_URLredirection, agent-SDK fragment for frontier-model integration.
- Setup guide — extras matrix, OS-specific install paths, calibration runbook
- Troubleshooting — common errors and recovery
/contextAPI reference — request/response schema, render branches, field-by-field- Operator runbooks — backfill, calibrate, consolidate, vacuum
- Config reference — every key in
helix.toml - Agent SDK integration — frontier prompt fragment + compliance eval
- Environment variables — runtime overrides
# 1. Install
pip install -e ".[all,launcher,otel]"
python -m spacy download en_core_web_sm
# 2. Pull a small model for the ribosome (optional — disabled by default)
ollama pull gemma3:e4b
# 3. Start the proxy
python -m uvicorn helix_context._asgi:app --host 127.0.0.1 --port 11437
# 4. Or, daily-driver tray (Windows):
start-helix-tray.bat
# 5. Seed your genome (one-time)
python examples/seed_genome.py path/to/your/docs/
# 6. Post-merge: backfill 1024-dim BGE-M3 vectors for Stage 2 dense recall
python scripts/backfill_bgem3_v2.py genomes/main/genome.dbFor full options including the extras matrix, see docs/SETUP.md.
A transparent OpenAI-compatible proxy that intercepts LLM requests and injects compressed context from a persistent SQLite genome. Six stages per turn, all LLM-free except step 4 splice (optional ribosome) and the downstream completion call:
- 0a. Classify — rule-based query classifier picks decoder mode + assembly cap (no model call).
- 1. Extract — heuristic keyword extraction from query (no model call).
- 2. Express — SQLite promoter-tag lookup + BGE-M3 dense recall + synonym expansion + co-activation. Ranks candidates via Reciprocal Rank Fusion over
{dense, FTS5, promoter, harmonic, SR}when[retrieval] fusion_mode = "rrf"(default"additive", see Gotchas). - 3. Re-rank — small CPU model scores candidates for relevance.
- 4. Splice — small CPU model trims introns, keeps exons (batched single call; optional).
- 5. Assemble — join spliced parts, enforce token budget, wrap in tags. The Stage 7 health pass downgrades to
MissBlock(reason="stale")when the top-1 source mtime exceedslast_verified_at. - 6. Replicate — pack query + response into the genome (background).
→ Swim-lane reference: docs/architecture/PIPELINE_LANES.md
→ Retrieval dimensions: docs/architecture/DIMENSIONS.md
Every /context response carries one of two top-level blocks:
know { found, confidence, gene_id_match, ... }— retrieval succeeded; theexpressed_contextbytes are grounded. The agent may answer from them.miss { reason, escalate_to | refresh_targets, do_not_answer_from_genome:true }— retrieval did NOT find it (or found it but it is stale, cold, or superseded). The agent should NOT answer from the genome; it should call an escalation tool fromescalate_to(grep | rag | web | ask_human) or refetch fromrefresh_targets.
To make a frontier model honor the contract, prepend the helix-context prompt fragment to your system prompt:
from helix_context.agent_prompt import full_fragment
system_prompt = full_fragment() + "\n\n" + your_existing_system_promptWithout the fragment, frontier models will paper over do_not_answer_from_genome and confabulate. See docs/agent-sdk-fragment.md for the full template, the <helix:no_match/> token semantics, and the compliance eval recipe.
/context accepts an optional caller_model_class: "generic" | "small_moe" | "frontier" field that selects the render branch:
frontier(Claude Opus, GPT-5, Gemini 3 Pro): forward rank-1-first ordering, larger assembly cap, full decoder mode.small_moe(qwen3:4b, gemma3:e4b): foveated reverse-rank order, JSON-shaped char-bounded answer slate, condensed decoder.generic(default): regression-locked byte-identical to pre-Stage-5 behavior.
See docs/api/context-endpoint.md §7 for the full behavior matrix.
| Endpoint | Purpose |
|---|---|
POST /context |
know/miss + expressed_context (primary integration). |
POST /context/packet |
Agent-safe bundle: verified / stale_risk / refresh_targets. |
POST /context/refresh-plan |
refresh_targets only — reread plan, no evidence items. |
POST /fingerprint |
Navigation-first payload (scores + metadata, no body). |
POST /consolidate |
Rewrite stale gene bodies from their source fingerprints (Stage 7 counterpart to refresh_targets). |
POST /sessions/register |
Register an agent participant (taude / laude / …) for attribution. |
POST /admin/refresh |
Force a retrieval-layer refresh (admin only). |
POST /admin/vacuum |
Reclaim SQLite pages after compaction (admin only). |
POST /ingest |
Add a document or exchange to the genome. |
GET /stats |
Genome metrics + compression ratio. |
GET /health |
Ribosome model, gene count, upstream URL, calibration provenance (Stage 4). |
POST /v1/chat/completions |
OpenAI-compatible proxy with automatic context injection. |
→ Full endpoint reference: docs/api/endpoints.md and docs/api/context-endpoint.md
→ MCP tool schemas: docs/api/mcp-tools.md
Two surfaces, two caller types:
/context |
/context/packet |
|
|---|---|---|
| Returns | Assembled compressed window | Pointer + verdict + refresh plan |
| LLM reads? | Directly | No — agent fetches if needed |
| Verdict emitted? | Top-level know / miss |
First-class: verified / stale_risk / needs_refresh |
| Best for | Chat clients, Continue | MCP agents, programmatic use |
Add to ~/.continue/config.yaml:
models:
- name: Helix (Local)
provider: openai
model: gemma3:e4b # or whatever is loaded in Ollama
apiBase: http://127.0.0.1:11437/v1
apiKey: EMPTY
roles: [chat]
defaultCompletionOptions:
contextLength: 128000 # Helix handles compression downstream
maxTokens: 4096Add to ~/.claude/settings.json:
{
"mcpServers": {
"helix-context": {
"command": "python",
"args": ["-m", "helix_context.mcp_server"],
"cwd": "/absolute/path/to/your/project",
"env": {
"HELIX_MCP_URL": "http://127.0.0.1:11437"
}
}
}
}ANTHROPIC_BASE_URL=http://localhost:11437 claude
OPENAI_BASE_URL=http://localhost:11437/v1 your-appSet path in [genome] to a file or directory:
[genome]
path = "genomes/main/genome.db" # relative to helix run directory
# Put this on your fastest NVMe for best ingest throughput.
# Example: path = "D:/helix/genome.db"One helix instance per genome — each reads its own helix.toml. Use the helix_context.hgt Python API to share genes across instances (Horizontal Gene Transfer).
SQLite WAL mode makes it safe to copy the .db file while helix is running:
# cron / Linux
cp genomes/main/genome.db backups/genome-$(date +%Y%m%d).db# PowerShell / Windows
Copy-Item genomes\main\genome.db backups\genome-$(Get-Date -Format yyyyMMdd).db/context/packet returns source_id pointers. Callers resolve them to bytes via the DAL:
from helix_context.adapters.dal import DAL
dal = DAL() # file + HTTP built-in
dal.register("s3", my_s3_fetcher) # register additional schemes
text, meta = dal.fetch("s3://bucket/schema.json")The tray (start-helix-tray.bat) manages the native OpenTelemetry binaries in tools/native-otel/ automatically. A balloon notification confirms the sidecar is running. To opt out: HELIX_OBSERVABILITY=0 start-helix-tray.bat.
Advanced — Docker stack: if you prefer a full Docker-compose observability stack (Prometheus, Tempo, Loki, Grafana), see deploy/otel/README.md.
| Doc | Topic |
|---|---|
| PIPELINE_LANES.md | Swim-lane reference: ingest, context, packet, fingerprint flows |
| DIMENSIONS.md | The 9 retrieval dimensions — schema, data, bench status |
| LAUNCHER.md | Supervisor, tray, observability stack lifecycle |
| SESSION_REGISTRY.md | Multi-agent session + party isolation |
| OBSERVABILITY.md | Prometheus metrics, Grafana dashboards, alert rules |
| KNOWLEDGE_GRAPH.md | Entity graph, harmonic links, co-activation |
- Model swap latency: the ribosome (small model) and the generation model share Ollama. Use
keep_alive = "30m"in helix.toml to pin the ribosome in memory. - Synonym map is critical: if queries return "no relevant context", check that query keywords map to the promoter tags the ribosome assigned. Add synonyms in
[synonyms]of helix.toml. - Short content may fail ingestion: the ribosome struggles with very short inputs (<200 chars). Pad with context or combine small files before ingesting.
genome.dbpersists: delete it to start fresh. It auto-creates on first use.- Continue Agent mode: use Chat mode, not Agent mode. The proxy does not handle tool routing.
know/missblock requires the agent prompt fragment to be honored — without it, frontier models confabulate. Importhelix_context.agent_prompt.full_fragment()and prepend it to your system prompt.- Stage 2 backfill is a one-time post-merge action —
embedding_dense_v2 IS NULLuntil you runscripts/backfill_bgem3_v2.py. Symptom:/contextretrieval rate plateaus low; check coverage viasqlite3 genome.db "SELECT COUNT(*) FROM genes WHERE embedding_dense_v2 IS NOT NULL". - Default
[retrieval] fusion_mode = "additive"is back-compat; flip to"rrf"after runningscripts/calibrate_thresholds.pyso the absolute-floor gates do not strand every query in BROAD. - Default
[abstain].mode = "global"is back-compat; flip to"per_classifier"after calibration to use the bench-derived floors.
# All mock tests (no Ollama needed, ~6s)
python -m pytest tests/ -m "not live" -v
# Live tests (requires Ollama running)
python -m pytest tests/ -m live -v -s
# Full suite
python -m pytest tests/ -vThe 7-stage retrieval-fix added Stage-by-Stage contract tests:
tests/test_dense_recall.py (Stage 2), tests/test_fusion_rrf.py (Stage 3),
tests/test_calibration.py (Stage 4), tests/test_caller_model_class.py (Stage 5),
tests/test_know_miss_block.py (Stage 6), tests/test_freshness_gate.py (Stage 7).
Built on: spaCy NER · Howard 2005 TCM · Stachenfeld 2017 SR · SQLite FTS5 BM25 · Kompress · Headroom
Licensed under Apache-2.0. See NOTICE for third-party attributions.