Give Claude, GPT-4, Gemini, and Ollama a persistent memory.
MemLayer is a universal, encrypted memory layer that sits between you and any AI. It captures every interaction, extracts durable knowledge, and injects only the relevant context into future prompts — so each session picks up where the last one left off without re-sending your entire conversation history.
You → mem-ai ask "How do I fix this React hook?"
↓
Retrieves relevant memories (~500 tokens of facts)
instead of re-sending your full session history
↓
claude / gpt-4o / gemini / ollama → Answer
↓
Captures this interaction as a new memory
How much it actually saves depends on your session length. If your conversation history is 10,000 tokens and the injected memory context is 600 tokens, you save 9,400 tokens on that query. The economics dashboard shows you the real numbers from your own usage.
| Problem | MemLayer's answer |
|---|---|
| LLMs are stateless — every session starts from zero | Persistent encrypted memory store per user |
| Re-sending full history costs more each turn | Extract-compress-retrieve: inject only what matters |
| Memory solutions are provider-locked | Universal adapter: Claude, GPT-4, Gemini, Ollama |
| No visibility into what memory is saving you | Economics engine: tracks real token/cost savings per query |
| Memory stored in the cloud is a privacy risk | AES-256-GCM encryption, per-user key derivation |
- Universal CLI wrapper — wrap any AI CLI:
mem-ai ask "prompt" --provider claude - Economics dashboard — see actual tokens saved, cost saved ($), per provider, from your own usage
- MCP server — native Claude Code integration via Model Context Protocol
- Browser extension — auto-injects memory into ChatGPT and Claude web interfaces
- Encrypted at rest — AES-256-GCM with PBKDF2 key derivation, no key storage
- Composite retrieval — semantic similarity + recency + importance scoring
- Self-hostable — full Docker Compose stack, no cloud dependency
cp backend/.env.example backend/.env
# Fill in at least one: OPENAI_API_KEY or ANTHROPIC_API_KEY
docker compose up -dThe API is now running at http://localhost:8000.
Dashboard at http://localhost:3000.
cd cli
pip install -e .
mem-ai setup # interactive first-time setup
mem-ai auth login# Wrap any AI in your terminal:
mem-ai ask "How do I write a Rust async function?" --provider claude
mem-ai ask "Review this Python class" --provider openai --model gpt-4o
mem-ai ask "Explain my project structure" --provider gemini
mem-ai ask "Help me debug this" --provider ollama --model llama3
# Search your memories:
mem-ai search "React hooks"
# See your actual savings from your own usage:
mem-ai statsAdd to .claude/settings.json:
{
"mcpServers": {
"memory": {
"command": "python",
"args": ["-m", "mem_ai.mcp_server"],
"env": {
"MEM_AI_API_URL": "http://localhost:8000",
"MEM_AI_TOKEN": "your-jwt-token"
}
}
}
}Now Claude Code can automatically search your memories and capture new ones.
┌─────────────────────────────────────────────────────────────────────┐
│ USER WORKFLOW │
│ │
│ mem-ai CLI / Browser Extension / MCP (Claude Code) │
│ │ ▲ │
│ │ prompt │ augmented prompt │
│ ▼ │ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ MEMORY LAYER (localhost:8000) │ │
│ │ • Semantic search over user memories (pgvector) │ │
│ │ • Composite re-ranking: 0.7×sim + 0.2×recency + 0.1×imp │ │
│ │ • Token/cost accounting per query │ │
│ └──────────────────────────┬───────────────────────────────────┘ │
│ │ │
└─────────────────────────────┼───────────────────────────────────────┘
│
┌───────────────┼────────────────┐
▼ ▼ ▼
FastAPI PostgreSQL Redis
backend + pgvector (Celery queue)
│
┌─────────▼──────────┐
│ Processing Pipeline│
│ 1. Fact extraction │ ← gpt-4o-mini / claude-haiku
│ 2. Summarize │
│ 3. Classify type │
│ 4. Embed (1536-d) │
│ 5. Store encrypted │ ← AES-256-GCM
└────────────────────┘
final_score = 0.70 × cosine_similarity(query, memory)
+ 0.20 × 2^(-days_old / 30) # recency decay, 30-day half-life
+ 0.10 × importance_score # 0.0–1.0
Every call to /memory/context logs the token counts and computes the savings:
tokens_saved = full_history_tokens - injected_context_tokens
cost_saved = tokens_saved × price_per_1M_tokens / 1,000,000
The full_history_tokens is the actual token count of the conversation history you would have sent without memory. The injected_context_tokens is what was actually injected. The difference is what you saved.
Provider pricing used (per 1M input tokens — verify current rates at each provider):
| Provider | Model | Price |
|---|---|---|
| Claude | Sonnet 4.5 | $3.00 |
| Claude | Haiku 4.5 | $0.80 |
| OpenAI | GPT-4o | $2.50 |
| OpenAI | GPT-4o-mini | $0.15 |
| Gemini | 2.0 Flash | $0.10 |
| Ollama | any | $0.00 |
POST /api/v1/auth/register Register a new user
POST /api/v1/auth/login Get JWT token
POST /api/v1/memory/capture Capture prompt+response (async processing)
POST /api/v1/memory/context Get memory-augmented prompt
GET /api/v1/memory/search?q=... Semantic search
GET /api/v1/memory/ List memories
GET /api/v1/analytics/summary?days=30 Aggregate savings from your usage
GET /api/v1/analytics/timeline?days=30 Daily time series
GET /api/v1/analytics/providers Per-provider breakdown
GET /api/v1/analytics/logs?limit=50 Raw request log
mem-ai ask <prompt> Send a prompt through the memory layer
--provider claude|openai|gemini|ollama
--model <model-name>
--platform <name> Tag this session
mem-ai capture <prompt> <resp> Manually capture an exchange
mem-ai search <query> Search your memories
mem-ai stats Show savings dashboard (from your actual usage)
mem-ai auth login/logout Authenticate
mem-ai setup Interactive first-time setup
mem-ai install-hooks Add shell hooks to .zshrc/.bashrc
ai-memory-layer/
├── backend/ FastAPI + PostgreSQL + Celery
│ ├── app/
│ │ ├── api/v1/ REST endpoints (memory, analytics, auth)
│ │ ├── models/ SQLAlchemy models (Memory, AnalyticsLog, ...)
│ │ ├── services/ Business logic (retrieval, pipeline, pricing, analytics)
│ │ └── workers/ Celery async tasks
│ └── alembic/ Database migrations
├── cli/ Universal terminal CLI wrapper
│ └── mem_ai/
│ ├── providers/ Claude, OpenAI, Gemini, Ollama adapters
│ ├── cli.py Click CLI entry point
│ └── mcp_server.py MCP server for Claude Code
├── frontend/ Next.js dashboard
│ ├── app/analytics/ Economics dashboard page
│ └── components/
│ ├── memory/ Memory browser UI
│ └── analytics/ Token savings charts
├── extension/ Chrome/Firefox browser extension
└── paper/ Academic paper
├── memlayer.tex LaTeX source
└── memlayer.pdf Compiled PDF
The paper is at paper/memlayer.tex (compiled paper/memlayer.pdf).
MemLayer: A Universal Encrypted Memory Layer for Cross-Provider LLM Context Efficiency
This paper describes the system design, the retrieval algorithm, the economics model, and a projected cost analysis based on publicly available provider pricing. Empirical evaluation on real conversational data is ongoing.
- Fork the repo
- Create a feature branch
- Run tests:
cd backend && pytest - Open a pull request
MIT — see LICENSE.