Synapses Intelligence adds semantic understanding to your codebase via local LLMs. No cloud required. No data leaves your machine. Runs on CPU or GPU.
Synapses (graph) ↔ Intelligence Sidecar
↓
llama-server | Ollama | Local CGo
↓
LLM Inference (Qwen, Mistral, etc.)
The brain sidecar enriches code graphs with:
- Semantic summaries — prose descriptions of what code does
- Architectural insight — how an entity fits into the system
- Rule explanations — why architectural violations matter
- Context packets — curated 800-token summaries for AI agents
- Episodic learning — co-occurrence patterns from past decisions
Synapses Intelligence is the reasoning layer for the Synapses code intelligence engine. It adds LLM-powered semantic enrichment to raw code graphs:
Input: Graph of code entities (nodes, edges, summaries) Process: 4-tier LLM system (Tier 0/1/2/3) — different models for different tasks Output: Context packets (~800 tokens), rule explanations, architectural insights
The brain is optional but powerful. Without it, Synapses still works via pure graph queries. With it, agents understand not just structure but intent.
The brain routes tasks to different LLM tiers for efficiency:
| Tier | Name | Purpose | GPU Model | CPU Model | Latency |
|---|---|---|---|---|---|
| 0 | Reflex | Fast prose summaries, boilerplate removal | qwen3.5:0.8b | qwen2.5-coder:1.5b | 3s |
| 1 | Sensory | Explain rule violations (cached) | qwen3.5:2b | qwen2.5-coder:1.5b | 5s |
| 2 | Specialist | Architectural insight, context packets | qwen3.5:4b | qwen2.5-coder:7b | 12s |
| 3 | Architect | Multi-agent conflict resolution | qwen3.5:9b | qwen2.5-coder:7b | 25s |
Key insight: Different tasks need different model sizes. Summaries (Tier 0) run fast on 0.8B. Context packets (Tier 2) need 4B reasoning. Most operations use Tier 0 or Tier 1, keeping inference fast.
Note: Synapses Intelligence is an optional sidecar — Synapses works without it. Install it to add LLM-powered semantic enrichment.
macOS / Linux (Homebrew):
brew tap SynapsesOS/tap
brew install brainDirect binary download — GitHub Releases:
| Platform | File |
|---|---|
| macOS (Apple Silicon) | brain_darwin_arm64.tar.gz |
| macOS (Intel) | brain_darwin_x86_64.tar.gz |
| Linux (x86_64) | brain_linux_x86_64.tar.gz |
| Linux (ARM64) | brain_linux_arm64.tar.gz |
| Windows | brain_windows_x86_64.zip |
Extract and place the brain binary on your PATH.
No Ollama needed. Subprocess-managed llama-server with OpenAI-compatible API. Auto-detects Metal (macOS), CUDA, ROCm, CPU.
# Setup: downloads llama-server binary + GGUF model
brain setup --llama-server
# Configure the model (optional)
brain config hf-repo Qwen/Qwen3.5-4B-Instruct-GGUF
brain config hf-filename qwen3.5-4b-instruct-q4_k_m.gguf
brain config download
# Start the sidecar
brain serveThen configure Synapses:
{
"brain": {
"url": "http://localhost:11435",
"timeout_sec": 60,
"enable_llm": true
}
}# Start Ollama (separate terminal)
ollama serve
# Setup brain for Ollama
brain setup
# Start the sidecar
brain serve# Requires C++ toolchain for compilation
CGO_ENABLED=1 go build -tags llamacpp ./cmd/brain/
# Start (uses local binary)
brain setup --local
brain serveAll endpoints run on localhost:11435 by default (port configurable in brain.json).
| Endpoint | Method | Description |
|---|---|---|
/v1/health |
GET | LLM liveness + model name + available status |
/v1/summary/{nodeId} |
GET | Fetch cached semantic summary for one node |
| Endpoint | Tier | Params | Response | Description |
|---|---|---|---|---|
/v1/ingest |
0 | node_id, code | summary, tags | Generate prose briefing for a code entity |
/v1/enrich |
2 | root_id, neighbors, task_context | insight, concerns, llm_used | Architectural insight for entity cluster |
/v1/explain-violation |
1 | rule_id, source_file, target_name | explanation, fix | Plain-English rule explanation (cached) |
/v1/coordinate |
3 | new_agent_id, conflicting_claims | suggestion, alternative_scope | Multi-agent conflict resolution |
/v1/context-packet |
optional | snapshot, phase, quality_mode | ContextPacket JSON | Phase-aware context assembly |
/v1/prune |
0 | content | pruned, length_before, length_after | Strip web boilerplate, return technical content |
| Endpoint | Method | Params | Description |
|---|---|---|---|
/v1/sdlc |
GET | — | Get current SDLC phase + quality mode |
/v1/sdlc/phase |
PUT | phase, agent_id | Set phase (planning/development/testing/review/release) |
/v1/sdlc/mode |
PUT | mode, agent_id | Set mode (standard/enterprise) |
/v1/decision |
POST | agent_id, phase, entity_name, action, outcome, notes | Log agent action to learning loop |
/v1/patterns |
GET | trigger (optional), limit | Learned co-occurrence patterns |
| Endpoint | Method | Params | Description |
|---|---|---|---|
/v1/adr |
POST | id, title, status, decision, context, consequences, linked_files | Create/update ADR |
/v1/adr |
GET | file (optional) | List ADRs; optionally filter by file |
/v1/adr/{id} |
GET | — | Get single ADR by ID |
| Endpoint | Method | Input | Output | Description |
|---|---|---|---|---|
/v1/embed |
POST | {"input": "text"} or {"input": ["text1", "text2"]} |
{"embedding": [...]} or {"embeddings": [...]} |
Single or batch embeddings (nomic-embed-text, 768-dim) |
Default path: ~/.synapses/brain.json. Override with $BRAIN_CONFIG env var.
{
"enabled": true,
"backend": "llama-server",
"port": 11435,
"timeout_ms": 120000,
"db_path": "~/.synapses/brain.sqlite",
"model_ingest": "qwen2.5-coder:1.5b",
"model_guardian": "qwen2.5-coder:1.5b",
"model_enrich": "qwen2.5-coder:7b",
"model_orchestrate": "qwen2.5-coder:7b",
"ingest": true,
"enrich": true,
"guardian": true,
"orchestrate": true,
"context_builder": true,
"learning_enabled": true,
"default_phase": "development",
"default_mode": "standard",
"llama_server_port": 11438,
"hf_repo": "Qwen/Qwen3.5-4B-Instruct-GGUF",
"hf_filename": "qwen3.5-4b-instruct-q4_k_m.gguf",
"embedding_enabled": true,
"embed_hf_repo": "nomic-ai/nomic-embed-text-v1.5-GGUF",
"embed_hf_filename": "nomic-embed-text-v1.5.Q4_K_M.gguf",
"embed_port": 11437
}Key fields:
backend—llama-server,ollama, orlocaltimeout_ms— LLM inference timeout (120000 = 120s for CPU)model_*— Per-tier model configurationhf_repo/hf_filename— HuggingFace model download (llama-server backend)embedding_enabled— Enable vector embeddings (requiresembed_port)
| Command | Description |
|---|---|
brain serve |
Start HTTP sidecar server on configured port |
brain status |
Show LLM status, model, SQLite stats, feature flags, SDLC config |
brain config <key> <value> |
Set config field and persist to brain.json |
brain setup |
Interactive setup (Ollama backend): probe models, detect GPU, write config |
brain setup --llama-server |
Setup llama-server backend: download binary + GGUF |
brain setup --local |
Setup local CGo backend: configure for in-process inference |
brain ingest <json> |
Manually trigger ingest task (for testing) |
brain summaries |
List all cached semantic summaries |
brain sdlc |
Show/set SDLC phase and mode |
brain decisions [entity] |
List decision log entries (optionally filtered by entity) |
brain patterns |
List learned co-occurrence patterns sorted by confidence |
brain reset |
Clear all brain.sqlite data (prompts for confirmation) |
brain benchmark |
Measure latency of all configured LLM models |
brain version |
Print version |
The brain is aware of your project's SDLC phase. Different phases get different context:
| Phase | Mode | Checklist | Use Case |
|---|---|---|---|
| planning | standard/enterprise | Requirements, design, dependencies | Architecting new features |
| development | standard/enterprise | Code review, testing, integration | Daily coding |
| testing | standard/enterprise | Test coverage, edge cases, performance | QA and validation |
| review | standard/enterprise | Release notes, changelog, deprecations | Pre-release |
| release | standard/enterprise | Rollout plan, rollback, monitoring | Deployment |
Set the phase via brain sdlc phase <phase> or the /v1/sdlc/phase API.
✅ All inference is local. No code or context leaves your machine. ✅ No cloud APIs. Models run on localhost (llama-server, Ollama, or in-process). ✅ No telemetry. No metrics, tracking, or logging to external services. ✅ SQLite-only storage. All summaries, insights, and decisions stored locally in brain.sqlite.
We welcome contributions! See CONTRIBUTING.md for:
- How to add a new LLM backend
- How to add a new HTTP route
- Testing with MockLLMClient (no Ollama needed)
MIT License — See LICENSE for details.
- GitHub: https://github.com/SynapsesOS/synapses-intelligence
- Core Server: https://github.com/SynapsesOS/synapses
- Web Intelligence: https://github.com/SynapsesOS/synapses-scout
- Organization: https://github.com/SynapsesOS
- Issues: https://github.com/SynapsesOS/synapses-intelligence/issues
- Discussions: https://github.com/SynapsesOS/synapses-intelligence/discussions
- Security: security@synapsesos.dev (see SECURITY.md)