Skip to content

Add Hebbian memory retrieval module#5

Open
sm1ly wants to merge 3 commits intoLadybugDB:masterfrom
sm1ly:feature/hebbian-memory
Open

Add Hebbian memory retrieval module#5
sm1ly wants to merge 3 commits intoLadybugDB:masterfrom
sm1ly:feature/hebbian-memory

Conversation

@sm1ly
Copy link

@sm1ly sm1ly commented Mar 26, 2026

Hebbian Memory Retrieval Module for LadybugDB api-server

Adds associative memory retrieval with Hebbian learning, cross-lingual search, and self-learning dictionary.

Features

  • Hebbian concept graph: Co-occurrence tracking with write-behind persistence (RAM → DB every 15s)
  • Cross-lingual search: Wiktionary dictionary (91k keys) + transliteration + Snowball stemming
  • IDF-weighted voting: Rare terms get heavier votes, common terms deprioritized
  • Three-layer dict lookup: exact O(1) → Snowball stem O(1) → trigram fuzzy ~2ms
  • Trigram fuzzy search: Dice coefficient over 91k dict keys, handles typos in both Cyrillic and Latin (кортошка→картошка, abliteraton→abliteration)

Endpoints

  • POST /memory/retrieve — keyword extraction + cross-lingual expansion + voting search + Hebbian boost
  • POST /memory/decay — decay weights, prune weak concepts/edges
  • POST /memory/flush — force write-behind flush to DB
  • GET /memory/stats — tracker statistics

Schema (additive, zero risk to existing data)

  • Concept node table (id, weight, access_count, created, last_accessed)
  • COOCCURS rel table (Concept→Concept, weight, count)

Performance (after Bekka review fixes)

Query Before After
Short (cold) 2-3s (Google API) 28ms
Short (cached) ~100ms 3ms
Large prompt (cold) 17s 1.2s
Large prompt (cached) ~100ms 5ms

Dependencies

  • snowball-stemmers — pure JS Snowball stemmer (Russian + English), 0 deps
  • dict_ru_en.json — 91k bidirectional Wiktionary translation pairs (4.3MB)

Review fixes (628f076)

  1. Removed Google Translate fallback — eliminated network latency, replaced with local trigram fuzzy search
  2. Replaced 65% prefix stemming with Snowball stemmers (proper morphological normalization)
  3. Capped IDF weight at 0.5 to prevent single-hit typo dominance

sm1ly added 3 commits March 26, 2026 11:50
Smart memory retrieval system with cross-lingual search capabilities.

New endpoint: POST /memory/retrieve
- Frequency-based keyword extraction from user text
- Translation cascade: Wiktionary dictionary (91k bidirectional keys)
  → ru↔en transliteration → Google Translate auto-supplement
- Cross-lingual clustering with 65% prefix stemming
- IDF-weighted voting search (rare terms = heavier votes)
- Hebbian co-occurrence tracking (COOCCURS edges between Concept nodes)
- Write-behind buffer: RAM accumulator → LadybugDB flush every 15s
- 2-level cache: expand cache + stem query cache (TTL 60s)

Additional endpoints:
- POST /memory/decay — apply decay to Hebbian weights, prune weak
- POST /memory/flush — force write-behind flush
- GET /memory/stats — tracker statistics

Files:
- memory.js: retrieval pipeline, Hebbian tracker, voting search
- dict_ru_en.json: Wiktionary FreeDict ru↔en translation dictionary
- index.js: mount /memory router (1 line added)
Bekka review fixes for PR LadybugDB#5:

1. Remove Google Translate fallback entirely — eliminates network
   latency spikes on dict misses. Replace with three-layer local
   dict lookup: exact O(1) → Snowball stem O(1) → trigram fuzzy ~2ms.

2. Replace naive 65% prefix stemming with Snowball stemmers (Russian +
   English). Proper morphological normalization instead of crude
   truncation that broke short words (шел→ше, шла→шл).

3. Cap IDF weight at 0.5 (min denominator 2) to prevent single-hit
   typo/rare-term dominance in voting search results.

Trigram index (Dice coefficient) handles typos in both Cyrillic and
Latin: кортошка→картошка, яблоука→яблоко, abliteraton→abliteration.

Performance: big prompt 17sec → 1.2sec (cold), 100ms → 5ms (cached).
Dependencies: +snowball-stemmers (pure JS, 24 languages, 0 deps).
Two bugs caused infinite retry loop and 7.6GB memory leak over 10h:

1. Undirected DELETE pattern -[r:COOCCURS]- not supported by LadybugDB.
   Changed to directed -[r:COOCCURS]->.

2. Batch try/catch re-added ALL dirty items on any single failure.
   Now each item has its own try/catch — failed items retry only if
   they still exist in RAM (prevents ghost retry of evicted/pruned items).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant