Helix Context

Genome-based context compression for local LLMs, with a machine-tagged know/miss agent contract.

Coordinate-index engine for LLM agents. Helix retrieves, weighs, and compresses your codebase into a context window — without a single LLM call on the retrieval path.

At a glance

Compression: collapses ~9k tokens of raw working set into a ~600 effective-token assembled context (28.7× headline on production workloads, 5.4× median across 15 query shapes).
Agent contract: every /context response carries a top-level know{} (grounded, you may answer) or miss{} (do_not_answer_from_genome:true, plus escalate_to tools or refresh_targets paths). Stale know blocks downgrade to miss(reason="stale"|"cold"|"superseded") via the Stage 7 freshness gate.
LLM-free retrieval: /context runs spaCy NER, SQLite FTS5 BM25, BGE-M3 dense recall, RRF fusion, Howard-2005 TCM, and Hebbian co-activation — zero ribosome calls. The only LLM call is downstream at /v1/chat/completions.
Three install paths: compact tray flow (start-helix-tray.bat) for the daily driver, proxy-only for OPENAI_BASE_URL redirection, agent-SDK fragment for frontier-model integration.

Quick navigation

Setup guide — extras matrix, OS-specific install paths, calibration runbook
Troubleshooting — common errors and recovery
/context API reference — request/response schema, render branches, field-by-field
Operator runbooks — backfill, calibrate, consolidate, vacuum
Config reference — every key in helix.toml
Agent SDK integration — frontier prompt fragment + compliance eval
Environment variables — runtime overrides

Quick Start

# 1. Install
pip install -e ".[all,launcher,otel]"
python -m spacy download en_core_web_sm

# 2. Pull a small model for the ribosome (optional — disabled by default)
ollama pull gemma3:e4b

# 3. Start the proxy
python -m uvicorn helix_context._asgi:app --host 127.0.0.1 --port 11437

# 4. Or, daily-driver tray (Windows):
start-helix-tray.bat

# 5. Seed your genome (one-time)
python examples/seed_genome.py path/to/your/docs/

# 6. Post-merge: backfill 1024-dim BGE-M3 vectors for Stage 2 dense recall
python scripts/backfill_bgem3_v2.py genomes/main/genome.db

For full options including the extras matrix, see docs/SETUP.md.

Pipeline

A transparent OpenAI-compatible proxy that intercepts LLM requests and injects compressed context from a persistent SQLite genome. Six stages per turn, all LLM-free except step 4 splice (optional ribosome) and the downstream completion call:

0a. Classify — rule-based query classifier picks decoder mode + assembly cap (no model call).
1. Extract — heuristic keyword extraction from query (no model call).
2. Express — SQLite promoter-tag lookup + BGE-M3 dense recall + synonym expansion + co-activation. Ranks candidates via Reciprocal Rank Fusion over {dense, FTS5, promoter, harmonic, SR} when [retrieval] fusion_mode = "rrf" (default "additive", see Gotchas).
3. Re-rank — small CPU model scores candidates for relevance.
4. Splice — small CPU model trims introns, keeps exons (batched single call; optional).
5. Assemble — join spliced parts, enforce token budget, wrap in tags. The Stage 7 health pass downgrades to MissBlock(reason="stale") when the top-1 source mtime exceeds last_verified_at.
6. Replicate — pack query + response into the genome (background).

→ Swim-lane reference: docs/architecture/PIPELINE_LANES.md → Retrieval dimensions: docs/architecture/DIMENSIONS.md

Agent integration

Every /context response carries one of two top-level blocks:

know { found, confidence, gene_id_match, ... } — retrieval succeeded; the expressed_context bytes are grounded. The agent may answer from them.
miss { reason, escalate_to | refresh_targets, do_not_answer_from_genome:true } — retrieval did NOT find it (or found it but it is stale, cold, or superseded). The agent should NOT answer from the genome; it should call an escalation tool from escalate_to (grep | rag | web | ask_human) or refetch from refresh_targets.

To make a frontier model honor the contract, prepend the helix-context prompt fragment to your system prompt:

from helix_context.agent_prompt import full_fragment

system_prompt = full_fragment() + "\n\n" + your_existing_system_prompt

Without the fragment, frontier models will paper over do_not_answer_from_genome and confabulate. See docs/agent-sdk-fragment.md for the full template, the <helix:no_match/> token semantics, and the compliance eval recipe.

Caller model class

/context accepts an optional caller_model_class: "generic" | "small_moe" | "frontier" field that selects the render branch:

frontier (Claude Opus, GPT-5, Gemini 3 Pro): forward rank-1-first ordering, larger assembly cap, full decoder mode.
small_moe (qwen3:4b, gemma3:e4b): foveated reverse-rank order, JSON-shaped char-bounded answer slate, condensed decoder.
generic (default): regression-locked byte-identical to pre-Stage-5 behavior.

See docs/api/context-endpoint.md §7 for the full behavior matrix.

Surfaces and endpoints

Endpoint	Purpose
`POST /context`	know/miss + `expressed_context` (primary integration).
`POST /context/packet`	Agent-safe bundle: `verified` / `stale_risk` / `refresh_targets`.
`POST /context/refresh-plan`	`refresh_targets` only — reread plan, no evidence items.
`POST /fingerprint`	Navigation-first payload (scores + metadata, no body).
`POST /consolidate`	Rewrite stale gene bodies from their source fingerprints (Stage 7 counterpart to `refresh_targets`).
`POST /sessions/register`	Register an agent participant (taude / laude / …) for attribution.
`POST /admin/refresh`	Force a retrieval-layer refresh (admin only).
`POST /admin/vacuum`	Reclaim SQLite pages after compaction (admin only).
`POST /ingest`	Add a document or exchange to the genome.
`GET /stats`	Genome metrics + compression ratio.
`GET /health`	Ribosome model, gene count, upstream URL, calibration provenance (Stage 4).
`POST /v1/chat/completions`	OpenAI-compatible proxy with automatic context injection.

→ Full endpoint reference: docs/api/endpoints.md and docs/api/context-endpoint.md → MCP tool schemas: docs/api/mcp-tools.md

Two surfaces, two caller types:

	`/context`	`/context/packet`
Returns	Assembled compressed window	Pointer + verdict + refresh plan
LLM reads?	Directly	No — agent fetches if needed
Verdict emitted?	Top-level `know` / `miss`	First-class: `verified / stale_risk / needs_refresh`
Best for	Chat clients, Continue	MCP agents, programmatic use

Continue IDE Integration

Add to ~/.continue/config.yaml:

models:
  - name: Helix (Local)
    provider: openai
    model: gemma3:e4b           # or whatever is loaded in Ollama
    apiBase: http://127.0.0.1:11437/v1
    apiKey: EMPTY
    roles: [chat]
    defaultCompletionOptions:
      contextLength: 128000     # Helix handles compression downstream
      maxTokens: 4096

MCP setup (Claude Code / Cursor / Claude Desktop)

Add to ~/.claude/settings.json:

{
  "mcpServers": {
    "helix-context": {
      "command": "python",
      "args": ["-m", "helix_context.mcp_server"],
      "cwd": "/absolute/path/to/your/project",
      "env": {
        "HELIX_MCP_URL": "http://127.0.0.1:11437"
      }
    }
  }
}

OpenAI-compatible proxy (zero code changes)

ANTHROPIC_BASE_URL=http://localhost:11437 claude
OPENAI_BASE_URL=http://localhost:11437/v1 your-app

Genome management

Genome path

Set path in [genome] to a file or directory:

[genome]
path = "genomes/main/genome.db"   # relative to helix run directory
# Put this on your fastest NVMe for best ingest throughput.
# Example: path = "D:/helix/genome.db"

One helix instance per genome — each reads its own helix.toml. Use the helix_context.hgt Python API to share genes across instances (Horizontal Gene Transfer).

Backup

SQLite WAL mode makes it safe to copy the .db file while helix is running:

# cron / Linux
cp genomes/main/genome.db backups/genome-$(date +%Y%m%d).db

# PowerShell / Windows
Copy-Item genomes\main\genome.db backups\genome-$(Get-Date -Format yyyyMMdd).db

DAL — source-content fetching

/context/packet returns source_id pointers. Callers resolve them to bytes via the DAL:

from helix_context.adapters.dal import DAL

dal = DAL()                          # file + HTTP built-in
dal.register("s3", my_s3_fetcher)    # register additional schemes
text, meta = dal.fetch("s3://bucket/schema.json")

Native observability (default)

The tray (start-helix-tray.bat) manages the native OpenTelemetry binaries in tools/native-otel/ automatically. A balloon notification confirms the sidecar is running. To opt out: HELIX_OBSERVABILITY=0 start-helix-tray.bat.

Advanced — Docker stack: if you prefer a full Docker-compose observability stack (Prometheus, Tempo, Loki, Grafana), see deploy/otel/README.md.

Architecture

Doc	Topic
PIPELINE_LANES.md	Swim-lane reference: ingest, context, packet, fingerprint flows
DIMENSIONS.md	The 9 retrieval dimensions — schema, data, bench status
LAUNCHER.md	Supervisor, tray, observability stack lifecycle
SESSION_REGISTRY.md	Multi-agent session + party isolation
OBSERVABILITY.md	Prometheus metrics, Grafana dashboards, alert rules
KNOWLEDGE_GRAPH.md	Entity graph, harmonic links, co-activation

Gotchas

Model swap latency: the ribosome (small model) and the generation model share Ollama. Use keep_alive = "30m" in helix.toml to pin the ribosome in memory.
Synonym map is critical: if queries return "no relevant context", check that query keywords map to the promoter tags the ribosome assigned. Add synonyms in [synonyms] of helix.toml.
Short content may fail ingestion: the ribosome struggles with very short inputs (<200 chars). Pad with context or combine small files before ingesting.
genome.db persists: delete it to start fresh. It auto-creates on first use.
Continue Agent mode: use Chat mode, not Agent mode. The proxy does not handle tool routing.
know/miss block requires the agent prompt fragment to be honored — without it, frontier models confabulate. Import helix_context.agent_prompt.full_fragment() and prepend it to your system prompt.
Stage 2 backfill is a one-time post-merge action — embedding_dense_v2 IS NULL until you run scripts/backfill_bgem3_v2.py. Symptom: /context retrieval rate plateaus low; check coverage via sqlite3 genome.db "SELECT COUNT(*) FROM genes WHERE embedding_dense_v2 IS NOT NULL".
Default [retrieval] fusion_mode = "additive" is back-compat; flip to "rrf" after running scripts/calibrate_thresholds.py so the absolute-floor gates do not strand every query in BROAD.
Default [abstain].mode = "global" is back-compat; flip to "per_classifier" after calibration to use the bench-derived floors.

Testing

# All mock tests (no Ollama needed, ~6s)
python -m pytest tests/ -m "not live" -v

# Live tests (requires Ollama running)
python -m pytest tests/ -m live -v -s

# Full suite
python -m pytest tests/ -v

The 7-stage retrieval-fix added Stage-by-Stage contract tests: tests/test_dense_recall.py (Stage 2), tests/test_fusion_rrf.py (Stage 3), tests/test_calibration.py (Stage 4), tests/test_caller_model_class.py (Stage 5), tests/test_know_miss_block.py (Stage 6), tests/test_freshness_gate.py (Stage 7).

Acknowledgments

Built on: spaCy NER · Howard 2005 TCM · Stachenfeld 2017 SR · SQLite FTS5 BM25 · Kompress · Headroom

Licensed under Apache-2.0. See NOTICE for third-party attributions.

Name		Name	Last commit message	Last commit date
Latest commit History 426 Commits
.github		.github
.obsidian		.obsidian
benchmarks		benchmarks
deploy		deploy
docs		docs
examples		examples
helix_context		helix_context
scripts		scripts
skills/helix		skills/helix
test_cases		test_cases
tests		tests
tools/native-otel		tools/native-otel
training		training
.env.example		.env.example
.gitignore		.gitignore
BENCHMARK_NOTES.md		BENCHMARK_NOTES.md
CLAUDE.md		CLAUDE.md
GEMINI.md		GEMINI.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
backend-with-otel.bat		backend-with-otel.bat
helix.toml		helix.toml
helix_status.py		helix_status.py
launcher-with-otel.bat		launcher-with-otel.bat
overnight_diamond.sh		overnight_diamond.sh
overnight_diamond_abstain_2026-05-03.sh		overnight_diamond_abstain_2026-05-03.sh
overnight_diamond_abstain_control_2026-05-03b.sh		overnight_diamond_abstain_control_2026-05-03b.sh
overnight_diamond_e4b_qwen3_2026-05-05.sh		overnight_diamond_e4b_qwen3_2026-05-05.sh
overnight_diamond_native_n20_2026-05-04.sh		overnight_diamond_native_n20_2026-05-04.sh
overnight_run.sh		overnight_run.sh
pyproject.toml		pyproject.toml
run_full_suite.ps1		run_full_suite.ps1
run_overnight_bench.ps1		run_overnight_bench.ps1
run_pr9_sweep.ps1		run_pr9_sweep.ps1
setup-helix.bat		setup-helix.bat
start-helix-mcpo.bat		start-helix-mcpo.bat
start-helix-tray.bat		start-helix-tray.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Helix Context

At a glance

Quick navigation

Quick Start

Pipeline

Agent integration

Caller model class

Surfaces and endpoints

Continue IDE Integration

MCP setup (Claude Code / Cursor / Claude Desktop)

OpenAI-compatible proxy (zero code changes)

Genome management

Genome path

Backup

DAL — source-content fetching

Native observability (default)

Architecture

Gotchas

Testing

Acknowledgments

Further reading

About

Uh oh!

Releases 10

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Helix Context

At a glance

Quick navigation

Quick Start

Pipeline

Agent integration

Caller model class

Surfaces and endpoints

Continue IDE Integration

MCP setup (Claude Code / Cursor / Claude Desktop)

OpenAI-compatible proxy (zero code changes)

Genome management

Genome path

Backup

DAL — source-content fetching

Native observability (default)

Architecture

Gotchas

Testing

Acknowledgments

Further reading

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages