Skip to content

SynapsesOS/synapses-intelligence

Repository files navigation

Synapses Intelligence — AI Brain Sidecar

Release License CI

Synapses Intelligence adds semantic understanding to your codebase via local LLMs. No cloud required. No data leaves your machine. Runs on CPU or GPU.

Synapses (graph) ↔ Intelligence Sidecar
                    ↓
              llama-server | Ollama | Local CGo
                    ↓
            LLM Inference (Qwen, Mistral, etc.)

The brain sidecar enriches code graphs with:

  • Semantic summaries — prose descriptions of what code does
  • Architectural insight — how an entity fits into the system
  • Rule explanations — why architectural violations matter
  • Context packets — curated 800-token summaries for AI agents
  • Episodic learning — co-occurrence patterns from past decisions

What is Synapses Intelligence?

Synapses Intelligence is the reasoning layer for the Synapses code intelligence engine. It adds LLM-powered semantic enrichment to raw code graphs:

Input: Graph of code entities (nodes, edges, summaries) Process: 4-tier LLM system (Tier 0/1/2/3) — different models for different tasks Output: Context packets (~800 tokens), rule explanations, architectural insights

The brain is optional but powerful. Without it, Synapses still works via pure graph queries. With it, agents understand not just structure but intent.


4-Tier LLM Architecture

The brain routes tasks to different LLM tiers for efficiency:

Tier Name Purpose GPU Model CPU Model Latency
0 Reflex Fast prose summaries, boilerplate removal qwen3.5:0.8b qwen2.5-coder:1.5b 3s
1 Sensory Explain rule violations (cached) qwen3.5:2b qwen2.5-coder:1.5b 5s
2 Specialist Architectural insight, context packets qwen3.5:4b qwen2.5-coder:7b 12s
3 Architect Multi-agent conflict resolution qwen3.5:9b qwen2.5-coder:7b 25s

Key insight: Different tasks need different model sizes. Summaries (Tier 0) run fast on 0.8B. Context packets (Tier 2) need 4B reasoning. Most operations use Tier 0 or Tier 1, keeping inference fast.


Install

Note: Synapses Intelligence is an optional sidecar — Synapses works without it. Install it to add LLM-powered semantic enrichment.

macOS / Linux (Homebrew):

brew tap SynapsesOS/tap
brew install brain

Direct binary downloadGitHub Releases:

Platform File
macOS (Apple Silicon) brain_darwin_arm64.tar.gz
macOS (Intel) brain_darwin_x86_64.tar.gz
Linux (x86_64) brain_linux_x86_64.tar.gz
Linux (ARM64) brain_linux_arm64.tar.gz
Windows brain_windows_x86_64.zip

Extract and place the brain binary on your PATH.


Quick Start

1. Recommended: llama-server (CPU/GPU Auto-Detect)

No Ollama needed. Subprocess-managed llama-server with OpenAI-compatible API. Auto-detects Metal (macOS), CUDA, ROCm, CPU.

# Setup: downloads llama-server binary + GGUF model
brain setup --llama-server

# Configure the model (optional)
brain config hf-repo Qwen/Qwen3.5-4B-Instruct-GGUF
brain config hf-filename qwen3.5-4b-instruct-q4_k_m.gguf
brain config download

# Start the sidecar
brain serve

Then configure Synapses:

{
  "brain": {
    "url": "http://localhost:11435",
    "timeout_sec": 60,
    "enable_llm": true
  }
}

2. Ollama Backend (Requires Ollama Sidecar)

# Start Ollama (separate terminal)
ollama serve

# Setup brain for Ollama
brain setup

# Start the sidecar
brain serve

3. Local CGo (In-Process, No Subprocess)

# Requires C++ toolchain for compilation
CGO_ENABLED=1 go build -tags llamacpp ./cmd/brain/

# Start (uses local binary)
brain setup --local
brain serve

HTTP API Reference

All endpoints run on localhost:11435 by default (port configurable in brain.json).

Health & Status

Endpoint Method Description
/v1/health GET LLM liveness + model name + available status
/v1/summary/{nodeId} GET Fetch cached semantic summary for one node

Core LLM Tasks

Endpoint Tier Params Response Description
/v1/ingest 0 node_id, code summary, tags Generate prose briefing for a code entity
/v1/enrich 2 root_id, neighbors, task_context insight, concerns, llm_used Architectural insight for entity cluster
/v1/explain-violation 1 rule_id, source_file, target_name explanation, fix Plain-English rule explanation (cached)
/v1/coordinate 3 new_agent_id, conflicting_claims suggestion, alternative_scope Multi-agent conflict resolution
/v1/context-packet optional snapshot, phase, quality_mode ContextPacket JSON Phase-aware context assembly
/v1/prune 0 content pruned, length_before, length_after Strip web boilerplate, return technical content

SDLC & Decision Log

Endpoint Method Params Description
/v1/sdlc GET Get current SDLC phase + quality mode
/v1/sdlc/phase PUT phase, agent_id Set phase (planning/development/testing/review/release)
/v1/sdlc/mode PUT mode, agent_id Set mode (standard/enterprise)
/v1/decision POST agent_id, phase, entity_name, action, outcome, notes Log agent action to learning loop
/v1/patterns GET trigger (optional), limit Learned co-occurrence patterns

Architectural Decision Records (ADRs)

Endpoint Method Params Description
/v1/adr POST id, title, status, decision, context, consequences, linked_files Create/update ADR
/v1/adr GET file (optional) List ADRs; optionally filter by file
/v1/adr/{id} GET Get single ADR by ID

Embeddings

Endpoint Method Input Output Description
/v1/embed POST {"input": "text"} or {"input": ["text1", "text2"]} {"embedding": [...]} or {"embeddings": [...]} Single or batch embeddings (nomic-embed-text, 768-dim)

brain.json Configuration

Default path: ~/.synapses/brain.json. Override with $BRAIN_CONFIG env var.

{
  "enabled": true,
  "backend": "llama-server",
  "port": 11435,
  "timeout_ms": 120000,
  "db_path": "~/.synapses/brain.sqlite",

  "model_ingest": "qwen2.5-coder:1.5b",
  "model_guardian": "qwen2.5-coder:1.5b",
  "model_enrich": "qwen2.5-coder:7b",
  "model_orchestrate": "qwen2.5-coder:7b",

  "ingest": true,
  "enrich": true,
  "guardian": true,
  "orchestrate": true,
  "context_builder": true,
  "learning_enabled": true,

  "default_phase": "development",
  "default_mode": "standard",

  "llama_server_port": 11438,
  "hf_repo": "Qwen/Qwen3.5-4B-Instruct-GGUF",
  "hf_filename": "qwen3.5-4b-instruct-q4_k_m.gguf",

  "embedding_enabled": true,
  "embed_hf_repo": "nomic-ai/nomic-embed-text-v1.5-GGUF",
  "embed_hf_filename": "nomic-embed-text-v1.5.Q4_K_M.gguf",
  "embed_port": 11437
}

Key fields:

  • backendllama-server, ollama, or local
  • timeout_ms — LLM inference timeout (120000 = 120s for CPU)
  • model_* — Per-tier model configuration
  • hf_repo / hf_filename — HuggingFace model download (llama-server backend)
  • embedding_enabled — Enable vector embeddings (requires embed_port)

CLI Commands

Command Description
brain serve Start HTTP sidecar server on configured port
brain status Show LLM status, model, SQLite stats, feature flags, SDLC config
brain config <key> <value> Set config field and persist to brain.json
brain setup Interactive setup (Ollama backend): probe models, detect GPU, write config
brain setup --llama-server Setup llama-server backend: download binary + GGUF
brain setup --local Setup local CGo backend: configure for in-process inference
brain ingest <json> Manually trigger ingest task (for testing)
brain summaries List all cached semantic summaries
brain sdlc Show/set SDLC phase and mode
brain decisions [entity] List decision log entries (optionally filtered by entity)
brain patterns List learned co-occurrence patterns sorted by confidence
brain reset Clear all brain.sqlite data (prompts for confirmation)
brain benchmark Measure latency of all configured LLM models
brain version Print version

SDLC Phase Awareness

The brain is aware of your project's SDLC phase. Different phases get different context:

Phase Mode Checklist Use Case
planning standard/enterprise Requirements, design, dependencies Architecting new features
development standard/enterprise Code review, testing, integration Daily coding
testing standard/enterprise Test coverage, edge cases, performance QA and validation
review standard/enterprise Release notes, changelog, deprecations Pre-release
release standard/enterprise Rollout plan, rollback, monitoring Deployment

Set the phase via brain sdlc phase <phase> or the /v1/sdlc/phase API.


Privacy Guarantee

All inference is local. No code or context leaves your machine. ✅ No cloud APIs. Models run on localhost (llama-server, Ollama, or in-process). ✅ No telemetry. No metrics, tracking, or logging to external services. ✅ SQLite-only storage. All summaries, insights, and decisions stored locally in brain.sqlite.


Contributing

We welcome contributions! See CONTRIBUTING.md for:

  • How to add a new LLM backend
  • How to add a new HTTP route
  • Testing with MockLLMClient (no Ollama needed)

License

MIT License — See LICENSE for details.


Links

Support

About

The Thinking Brain for Synapses — a local LLM sidecar that adds semantic reasoning, context packets, and co-occurrence learning to the code-graph MCP server.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors