Python client for the HEBBS cognitive memory engine. Provides an async gRPC client with a Pythonic interface -- no protobuf in the public API.
HEBBS gives your AI agents real memory: sub-10ms writes, sub-10ms recalls, four recall strategies, entity-scoped multitenancy, and GDPR-compliant erasure. This SDK wraps the gRPC API in idiomatic Python so you can integrate HEBBS in minutes.
curl -sSf https://hebbs.ai/install | shThis installs hebbs-server, hebbs-cli, and hebbs-bench to ~/.hebbs/bin/. The server runs BGE-small-en-v1.5 (ONNX) for embeddings internally -- no external embedding API needed.
hebbs-serverThe server listens on localhost:6380 (gRPC) and localhost:6381 (REST) by default. On first start it prints a bootstrap API key -- save it.
pip install hebbsTo include the demo app and all LLM providers:
pip install "hebbs[demo]"import asyncio
from hebbs import HebbsClient
async def main():
# api_key falls back to HEBBS_API_KEY env var if not passed
async with HebbsClient("localhost:6380", api_key="hb_...") as h:
# Store a memory (entity-scoped)
mem = await h.remember(
content="Acme Corp uses Salesforce CRM and has 200 engineers",
importance=0.8,
context={"company": "Acme Corp", "topic": "tech_stack"},
entity_id="acme_corp",
)
# Recall by semantic similarity
results = await h.recall(
cue="What CRM does Acme use?",
strategies=["similarity"],
top_k=5,
entity_id="acme_corp",
)
for r in results.results:
print(f" [{r.score:.2f}] {r.memory.content}")
# Prime a session (load relevant context for an entity)
prime = await h.prime(entity_id="acme_corp", max_memories=50)
print(f"Primed {len(prime.results)} memories")
# Generate insights from memory clusters (uses LLM server-side)
reflect = await h.reflect(entity_id="acme_corp")
print(f"Created {reflect.insights_created} insights")
# GDPR-compliant cryptographic erasure
forget = await h.forget(entity_id="acme_corp")
print(f"Forgot {forget.forgotten_count} memories")
asyncio.run(main())| Method | Description |
|---|---|
remember(content, importance, context, entity_id, edges) |
Store a memory |
get(memory_id) |
Retrieve a memory by ID (bytes) |
recall(cue, strategies, top_k, entity_id, scoring_weights, cue_context) |
Multi-strategy recall |
prime(entity_id, max_memories, similarity_cue, scoring_weights) |
Session warm-up |
revise(memory_id, content, importance, context, entity_id) |
Update a memory |
forget(entity_id, memory_ids) |
GDPR-compliant erasure (by entity or by IDs) |
set_policy(max_snapshots_per_memory, auto_forget_threshold, decay_half_life_days) |
Configure tenant policies |
subscribe(entity_id, confidence_threshold) |
Real-time memory surfacing |
reflect(entity_id) |
Generate insights from clusters (LLM server-side) |
insights(entity_id, max_results) |
Retrieve accumulated insights |
health() |
Server health check |
count() |
Total memory count |
- similarity -- semantic vector search (HNSW, entity-filtered)
- temporal -- time-ordered retrieval
- causal -- cause-and-effect graph traversal
- analogical -- cross-domain pattern matching (blends embedding + structural similarity)
Pass strategy names as strings for basic usage. For advanced tuning, pass RecallStrategyConfig objects. You can mix both in the same call:
results = await h.recall(
cue="What happened with Acme?",
strategies=["temporal", RecallStrategyConfig("similarity", top_k=3, ef_search=200)],
entity_id="acme_corp",
)Per-strategy tuning parameters for advanced recall. Most users should just pass strategy names as strings. Use this when you need fine-grained control.
| Field | Type | Default | Used By | Description |
|---|---|---|---|---|
strategy |
str |
(required) | all | Strategy name: "similarity", "temporal", "causal", "analogical" |
entity_id |
str | None |
None |
all | Override entity scope for this strategy |
top_k |
int | None |
None |
all | Per-strategy result limit (separate from the top-level top_k) |
ef_search |
int | None |
50 |
similarity | HNSW candidate count. Higher = more accurate, slower. |
time_range |
tuple[int, int] | None |
None (unbounded) |
temporal | (start_us, end_us) microsecond timestamps. When omitted, returns all memories newest-first. |
seed_memory_id |
bytes | None |
None (auto) |
causal | Starting node for graph traversal. When omitted, the engine picks the best seed. |
max_depth |
int | None |
5 (max 10) |
causal | Maximum hops in graph traversal. |
edge_types |
list[EdgeType] | None |
None (all) |
causal | Restrict traversal to specific edge types. |
analogical_alpha |
float | None |
0.5 |
analogical | Blend weight: 0.0 = pure structural, 1.0 = pure embedding similarity. |
Causal recall -- trace cause-and-effect chains from a seed memory:
from hebbs import RecallStrategyConfig, EdgeType
results = await h.recall(
cue="What led to the pricing pushback?",
strategies=[
RecallStrategyConfig(
"causal",
seed_memory_id=mem.id,
max_depth=3,
edge_types=[EdgeType.CAUSED_BY, EdgeType.FOLLOWED_BY],
)
],
)Analogical recall -- find structurally similar patterns across entities:
results = await h.recall(
cue="enterprise CRM evaluation",
strategies=[RecallStrategyConfig("analogical", analogical_alpha=0.7)],
cue_context={"industry": "technology", "stage": "evaluation"},
top_k=5,
)Recall results are ranked by a composite score blending relevance, recency, importance, and reinforcement. Pass scoring_weights to tune the blend:
from hebbs import ScoringWeights
# Pure semantic match
results = await h.recall(
cue="competitor pricing",
scoring_weights=ScoringWeights(w_relevance=1.0, w_recency=0.0, w_importance=0.0, w_reinforcement=0.0),
)
# Recency-biased -- "what just happened?"
results = await h.recall(
cue="latest updates",
scoring_weights=ScoringWeights(w_relevance=0.2, w_recency=0.8, w_importance=0.0, w_reinforcement=0.0),
)
# Also works as a plain dict
results = await h.recall(
cue="latest updates",
scoring_weights={"w_relevance": 0.2, "w_recency": 0.8, "w_importance": 0.0, "w_reinforcement": 0.0},
)Omit scoring_weights for the default blend (relevance 0.5, recency 0.2, importance 0.2, reinforcement 0.1).
The server generates a bootstrap API key on first start and prints it to stderr. Pass it to the client:
async with HebbsClient("localhost:6380", api_key="hb_...") as h:
...Or set the HEBBS_API_KEY environment variable and omit api_key -- the SDK picks it up automatically. To explicitly connect without auth, pass api_key="".
All HEBBS operations are scoped by entity_id. Memories stored under one entity are never returned when querying a different entity -- this applies to all four recall strategies, prime, reflect, and insights. No configuration needed; isolation is structural.
await h.remember(content="Uses Salesforce", entity_id="acme_corp")
await h.remember(content="Uses HubSpot", entity_id="techflow_inc")
results = await h.recall(cue="What CRM?", entity_id="acme_corp")
# Only returns "Uses Salesforce" -- techflow_inc data is invisibleThe demo ships an AI sales agent ("Atlas") that uses HEBBS for memory-augmented conversations. It shows every HEBBS operation in real-time panels: remember latency, recall scores, prime context, and session metrics.
# Pick one:
export GEMINI_API_KEY="your-key" # Gemini (default)
export OPENAI_API_KEY="your-key" # OpenAI
export ANTHROPIC_API_KEY="your-key" # Anthropic
# Or use Ollama / mock (no keys needed)hebbs-demo interactiveOr specify a config:
hebbs-demo interactive --config gemini-vertex # Gemini via Vertex AI
hebbs-demo interactive --config gemini # Gemini via API key
hebbs-demo interactive --config openai # GPT-4o
hebbs-demo interactive --config local # Ollama (no API key)
hebbs-demo interactive --mock-llm # Mock LLM (no API key)Switch entities mid-session to demonstrate multitenancy:
hebbs-demo interactive --entity-id acme_corp
# In-session: /session techflow_inchebbs-demo scenarios --all # Run all 7 scenarios
hebbs-demo scenarios --run discovery_call # Run a specific oneAvailable scenarios: discovery_call, objection_handling, multi_session, reflect_learning, subscribe_realtime, forget_gdpr, multi_entity.
| Provider | Config | Env Variable |
|---|---|---|
| Gemini (Vertex AI) | gemini-vertex.toml |
GOOGLE_APPLICATION_CREDENTIALS, GOOGLE_CLOUD_PROJECT |
| Gemini (API key) | gemini.toml |
GEMINI_API_KEY |
| OpenAI | openai.toml |
OPENAI_API_KEY |
| Anthropic | (custom toml) | ANTHROPIC_API_KEY |
| Ollama | local.toml |
(none -- Ollama must be running) |
- Python >= 3.10
- A running HEBBS server (gRPC on port 6380)
Contributions are welcome. By submitting a pull request, you agree to the Contributor License Agreement.
Copyright 2025 Parag Arora. Apache 2.0 -- see LICENSE.