20 Mar 11:15

RMANOV

10f6250

Latest

┌─────────────────────────────────────────────────────────────────┐
│  SmartKey v0.6.0 — "The Brain"                                  │
│                                                                 │
│  22 commits · 45 files · +6,474 lines · 311 tests              │
│  Master Algorithm · Regime Detection · CapsEngine · Frustration │
│  CorrectionMemory · TechVocab · LangModel · FFI Protocol       │
└─────────────────────────────────────────────────────────────────┘

TL;DR

v0.5.0 taught SmartKey to detect your keyboard layout. v0.6.0 teaches it to understand you. A new proactive Master Algorithm observes your typing patterns, classifies your current regime (fast coding? deliberate prose? language switching?), tracks frustration signals, remembers your corrections, applies smart capitalisation — and adjusts every prediction threshold in real-time. The dual-buffer engine from v0.5.0 is now wrapped in a 4-phase state machine that anticipates, tracks, corrects, and learns. 311 tests verify it all.

Highlights

🧠 Proactive Master Algorithm (`master_loop.rs` — 750 LoC)

The centrepiece of v0.6.0. A decorator-pattern orchestrator wrapping InputMethodCore with a 4-phase state machine:

ANTICIPATING → TRACKING → CORRECTING → LEARNING
     │              │            │            │
     ▼              ▼            ▼            ▼
 Read context   Follow word   Detect bad   Update profile
 Set lang prior  Build ghost   prediction   Save corrections
 Bias ensemble   Monitor WPM   Suppress it  Adjust thresholds

Key innovation: Hints injection. Before each prediction, MasterLoop computes a Hints struct (lang_prior, suppress_ghost, confidence_boost) and passes it down to the ensemble. This means predictions adapt to context before the user finishes typing — not after.

📊 Typing Regime Detection (`typing_regime.rs` — 200 LoC)

Inspired by market regime detection in algorithmic trading. Five EMA signals (WPM, word length, out-of-vocabulary rate, dual-buffer flip rate, language ratio) are tracked per word and classified into 5 regimes:

Regime	Trigger	Effect
`FastCoding`	High WPM + OOV + short words	TechVocab δ weight ↑, ghost threshold ↓
`DeliberateProse`	Low WPM + long words	Markov β weight ↑, longer ghost text
`MixedLanguage`	High flip rate	Dual-buffer stays unlocked longer
`NewTopic`	OOV spike	Reset session cache, broaden predictions
`LanguageSwitch`	Sharp lang ratio flip	Lock to new language faster

🔠 CapsEngine (`caps.rs` — 100 LoC)

The trie stores all words lowercase. CapsEngine detects the caps pattern from what you've typed so far and applies it to predictions:

Normal — first-letter capitalisation at sentence start
AllCaps → HELLO
CamelCase → camelCase
PascalCase → PascalCase

Proper nouns are capitalised regardless. This is why the CI tests needed eq_ignore_ascii_case — predictions are no longer always lowercase.

😤 Frustration Detector (`frustration.rs` — 200 LoC)

Real-time keystroke pattern monitor detecting 4 frustration signals:

REJECT — Backspace within 500ms of Tab acceptance → you didn't want that prediction
RAPID_DELETE — 3+ Backspace in <1 second → something went wrong
RETYPE — Delete all + retype in different script → wrong language was locked
ABANDON — Escape + continued manual typing → user gave up on predictions

Each signal carries severity [0.0, 1.0]. When frustration spikes, ghost text is suppressed for the next N words. The Master Algorithm's LEARNING phase uses these signals to adjust LightProfile.confidence_floor.

🔧 Tech Vocabulary (`tech_vocab.rs` — 150 LoC)

300+ hardcoded technical terms (programming keywords, framework names, CLI commands) with frequency-based scoring. Active as the δ weight in the ensemble only during FastCoding regime — prevents function from appearing while writing Bulgarian prose.

🧬 Per-Language Model Bank (`lang_model.rs` — 100 LoC)

Each detected language gets its own isolated model bank (trie + Markov + PPM + Kneser-Ney). Prevents cross-language contamination: Bulgarian Markov chains don't bleed into English predictions and vice versa.

🔄 Correction Memory (`correction_memory.rs` — 150 LoC)

When the user corrects the same prediction 3+ times in the same context, the bad prediction is suppressed and the correction is shown instead. Uses 2-word context hashing and LRU eviction (max 500 entries). Serialises to PersonalProfile v4 for persistence across sessions.

📡 FFI Protocol (`ffi_protocol.rs` — 80 LoC)

Single source of truth for all FFI serialisation: ReplaceWord payloads use \x1F (Unit Separator) between replace_len and text; ShowComposing uses \x00 between typed and ghost portions. Shared by PyO3, C FFI, and test harnesses — eliminates parser mismatches.

👤 Light Profile (`light_profile.rs` — 100 LoC)

Adaptive confidence tuning based on rolling accept/reject ratio:

High acceptance → lower confidence floor → more ghost suggestions shown
Low acceptance → higher floor → fewer, higher-quality suggestions
Formula: max(0.15, 0.50 - 0.35 × accept_rate)
Auto-suppresses ghost text for 2 words after a rejection

🎮 Playground Simulation Harness (`smartkey-playground` — new crate)

Closed-loop simulator driving InputMethodCore through target text, measuring keystroke efficiency, prediction accuracy, and ghost acceptance rates. Includes preset scenarios and HTML report generation.

Architecture

v0.6.0 adds a new orchestration layer above the existing prediction stack:

┌──────────────────────────────────────────────────────┐
│  MasterLoop (new)                                     │
│  ┌─ RegimeDetector ── FrustrationDetector            │
│  ├─ LightProfile ──── CorrectionMemory               │
│  └─ Hints → InputMethodCore                          │
│              ┌─ DualBuffer (v0.5.0)                  │
│              ├─ CapsEngine (new)                     │
│              └─ Ensemble                             │
│                  ├─ α  Corpus frequency              │
│                  ├─ β  Markov chain                   │
│                  ├─ γ  CVM counter                    │
│                  └─ δ  TechVocab (new, regime-gated) │
└──────────────────────────────────────────────────────┘

Preedit-First Dual Buffer

input.rs refactored from 1,168 → 1,824 LoC. The dual buffer now operates in preedit mode: text is held in a composition buffer before commit, eliminating character doubling that plagued v0.5.0. Ghost text comparison is now Unicode-safe.

Space Auto-Commit + Syntactic Casing

Pressing Space now auto-commits the current word (no explicit Tab needed for unambiguous predictions). After commit, CapsEngine applies casing rules: sentence-start capitalisation, proper noun detection, regime-based transforms.

By the Numbers

Metric	v0.5.0	v0.6.0
Files changed	—	45
Lines added	—	+6,474
Lines removed	—	-358
Net growth	—	+6,116 LoC
New modules	—	10
New crate	—	`smartkey-playground`
`master_loop.rs`	—	750 LoC (new)
`typing_regime.rs`	—	200 LoC (new)
`frustration.rs`	—	200 LoC (new)
`tech_vocab.rs`	—	150 LoC (new)
`correction_memory.rs`	—	150 LoC (new)
`caps.rs`	—	100 LoC (new)
`lang_model.rs`	—	100 LoC (new)
`light_profile.rs`	—	100 LoC (new)
`ffi_protocol.rs`	—	80 LoC (new)
`ensemble.rs`	~670 LoC	1,394 LoC (+108%)
`input.rs`	~1,168 LoC	1,824 LoC (+56%)
`virtual_typer.rs`	—	881 LoC (new test)
Tests	207	311 (+50%)

Full Changelog

Features

feat(core): master_loop.rs — proactive 4-phase Master Algorithm with Hints injection (750 LoC)
feat(core): typing_regime.rs — 5-regime detection via EMA signals (200 LoC)
feat(core): caps.rs — CapsEngine: smart capitalisation from typed prefix (100 LoC)
feat(core): tech_vocab.rs — 300+ technical terms as ensemble δ signal (150 LoC)
feat(core): frustration.rs — 4-signal frustration detector (200 LoC)
feat(core): lang_model.rs — per-language model isolation bank (100 LoC)
feat(core): correction_memory.rs — repeated correction suppression with LRU (150 LoC)
feat(core): light_profile.rs — adaptive confidence floor tuning (100 LoC)
feat(core): ffi_protocol.rs — unified FFI serialisation contract (80 LoC)
feat(core): preedit-first dual buffer — eliminate character doubling
feat(core): aggressive space auto-commit and syntactic casing restoration
feat(core): per-language models + smart caps for 2× prediction accuracy
feat(core): confidence hardening + eval histogram
feat(playground): closed-loop simulation harness for prediction quality eval

Fixes

fix(core): bug hunt v15 — 10 fixes + 4 simplifications in Master Algorithm
fix(core): wire flip detection, clear momentum on reset, harden ghost text
fix(core): Unicode-safe ghost text comparison, remove dead sentence_start field
fix(core): resolve compile errors and ghost text regression
fix(core): case-insensitive assertions in integration + virtual_typer tests
fix: ReplaceWord parser mismatch, macOS handler, overflow guards
fix: confidence model, overflow guards, code quality improvements
fix: parallel surgery — preedit lifecycle, language detection, prediction quality

Chores

chore(core): remove applied caps.patch — history preserved in git blame
refactor(core): simplify hypothesis phase checks and reduce allocations
ci: add smartkey-playground to test and clippy jobs

Known Issues / Tech Debt

Confidence single-candidate weakness: Softmax normalisation gives confidence=1.0 to a single low-score prediction. Mitigated by ghost_text_min_score threshold, but a pro...

Assets 2

14 Mar 10:19

RMANOV

v0.5.0

f5e8a5b

v0.5.0 — "Never Switch Again"

┌─────────────────────────────────────────────────────────────────┐
│  SmartKey v0.5.0 — "Never Switch Again"                         │
│                                                                 │
│  12 commits · 27 files · +1,985 lines · 207 tests              │
│  Dual-buffer · evdev keymaps · IBus replace · layout-agnostic   │
└─────────────────────────────────────────────────────────────────┘

TL;DR

Type zdravej on a QWERTY keyboard. SmartKey intercepts the hardware scancodes, maintains a parallel Bulgarian Phonetic interpretation in a second buffer, compares corpus frequencies, decides you meant Bulgarian, and outputs здравей — without you pressing Alt+Shift, Ctrl+Space, or touching any language-switch shortcut. That is v0.5.0 in one sentence.

Highlights

🔄 Dual-Buffer Layout-Agnostic Input (`dual_buffer.rs` — 399 LoC)

The centrepiece of this release. Architecture:

Hardware key press
        │
        ▼
   evdev scancode
   (platform-neutral)
        │
   ┌────┴────┐
   ▼         ▼
 EN QWERTY  BG Phonetic
  buffer     buffer
   │         │
   └────┬────┘
        │
   corpus frequency
   comparison
        │
   confidence ≥ 0.7?
        │
   ┌────┴────────────────┐
   ▼                     ▼
 lock + output        keep accumulating
 ReplaceWord action   both buffers

Key design decisions:

Confidence threshold 0.7 — SmartKey doesn't guess until it's sure. Below 0.7 both buffers run in parallel, above 0.7 the winning language is locked and ReplaceWord is emitted.
Flip detection — if you suddenly type characters that only exist in the non-locked language, SmartKey unlocks, reassesses, and can flip. There's no permanent lock-in.
ReplaceWord action — the existing Action enum gains a new variant. All downstream crates (win, py, ibus) handle it.

⌨️ evdev Keymap Tables (`keymap.rs` — 323 LoC)

Previous builds used XKB key codes in lookup tables — which are already layout-remapped. This made dual-buffer impossible (you can't detect layout from remapped codes). v0.5.0 fixes this at the root: lookup tables now use raw evdev scancodes (Linux kernel constants), which are hardware positions, not characters.

Two complete lookup tables:

EN_QWERTY_MAP: evdev scancode → ASCII character (+ Shift variant)
BG_PHONETIC_MAP: evdev scancode → Cyrillic character (+ Shift variant), following the Bulgarian Phonetic standard layout

🔁 IBus Replace Handler (`ibus/`)

The IBus adapter gains a replace action handler: when ReplaceWord arrives, it erases the N characters in the composition buffer and commits the corrected text. Bug fixes in this path:

Off-by-one in backspace count (was deleting one extra character)
Wrong CapsLock bit in key event synthesised for backspace
Consumed-key event was being committed as text

🌡️ Cold-Start Wrong-Layout Detection

On session start SmartKey has no typed history — the confidence estimator starts blind. v0.5.0 adds a frequency-based cold-start heuristic: compare the character frequency distribution of the current input prefix against expected EN and BG frequencies. Whichever distribution is less surprising wins the initial guess.

Prior approach used raw character counts, which failed on short prefixes. Frequency-ratio comparison is stable at 2–3 characters.

🛠️ ReplaceWord Across All Crates

The new Action::ReplaceWord(String) variant is handled in:

smartkey-core — emitted by DualBuffer::process_key
smartkey-py (PyO3 bridge) — exposed as ("replace", text) tuple in Python API
smartkey-win (TSF) — Key::RawCode arm added to should_claim match; TSF composition handles replace
smartkey-ibus — replace handler implemented with corrected backspace synthesis

🦀 Rust 1.94 Clippy Compliance

New lints in Rust 1.94:

is_multiple_of — replaces manual % n == 0 patterns
unnecessary_unwrap — replaces if x.is_some() { x.unwrap() } with if let Some(v) = x
CI now runs on Rust stable with -D warnings enforced

By the Numbers

Metric	v0.4.0	v0.5.0
Files changed	—	27
Lines added	—	+1,985
Lines removed	—	-152
Net growth	—	+1,833 LoC
`dual_buffer.rs`	—	399 LoC (new)
`keymap.rs`	—	323 LoC (new)
Tests	~150	207 (+38%)
Language switch shortcuts needed	Yes	No
Rust version compliance	1.8x	1.94
CI Node.js	actions/checkout v4	v5 (Node 24)

Known Issues / Tech Debt

Being honest about what v0.5.0 does not yet solve:

macOS: Dual-buffer is implemented in smartkey-core but the macOS IMK adapter does not yet plumb ReplaceWord through to the Swift layer. macOS gets prediction quality improvements from v0.4.0 but not layout-agnostic input.
Windows: Key::RawCode arm added; however the evdev scancode concept is Linux-specific. Windows uses virtual key codes, so the keymap.rs tables don't apply directly. The Windows path uses the existing ToUnicodeEx approach — a unified scancode abstraction across platforms is planned.
Performance: dual_buffer.rs is O(n) in prefix length per keypress. For normal typing (< 50 chars) this is imperceptible. Long pastes trigger one full re-evaluation.
Confidence tuning: The 0.7 threshold is hardcoded. A config parameter is planned.
Single Cyrillic keyboard layout: Only Bulgarian Phonetic is in keymap.rs. Russian Phonetic, Serbian Cyrillic, Ukrainian, etc. are not yet represented.

Full Changelog

Features

feat(core): dual_buffer.rs — DualBuffer struct, confidence locking, flip detection, ReplaceWord emission (399 LoC)
feat(core): keymap.rs — evdev scancode lookup tables for EN QWERTY + BG Phonetic (323 LoC)
feat(core): Action::ReplaceWord(String) variant — handled in all downstream crates
feat(core): instant language detection + language-aware scoring (v0.4.1 backport)

Fixes

fix(core): Use evdev keycodes instead of XKB in keymap lookup tables — enables true layout-agnostic detection
fix(core): Allow wrong-layout detection at cold start — frequency-based comparison replaces broken count-based heuristic
fix(core): Replace broken count-based layout detection with frequency-ratio comparison
fix(core): Handle Action::ReplaceWord in all downstream crates
fix(core): v0.4.1 post-review cleanup — all clippy warnings resolved
fix(ibus): Add replace handler — erases composition buffer and commits corrected text
fix(ibus): Fix replace action — off-by-one backspace count, wrong CapsLock bit, consumed-key commit
fix(win): Add Key::RawCode arm to TSF should_claim match
fix(ci): Resolve Rust 1.94 clippy lints (is_multiple_of, unnecessary_unwrap)
chore(ci): Bump actions/checkout v4 → v5 (Node.js 24)

What's Next (v0.6.0 and beyond)

macOS: plumb ReplaceWord through to IMK Swift layer
Windows: unified scancode abstraction (virtual key → evdev mapping)
Config file: expose confidence_threshold, flip_sensitivity, feature flags
Additional Cyrillic layouts: Russian Phonetic, Ukrainian, Serbian
GTK4 popup panel — Alt-hold to see full candidate list
Memory-mapped binary corpus format (eliminate JSON parse at startup)
Windows MSI installer + macOS .app bundle
Mobile: Android IME via JNI

Install

git clone https://github.com/RMANOV/smartkey.git
cd smartkey
cargo build --release
cargo test --workspace   # 207 tests

Linux (IBus — with layout-agnostic input):

maturin develop -m crates/smartkey-py/Cargo.toml --release
# Start typing — SmartKey detects your intended language automatically

Windows (TSF): cargo build -p smartkey-win --release → register DLL (HKCU, no elevation)
macOS (IMK): cargo build -p smartkey-mac --release → link from Swift

Full diff: v0.4.0...v0.5.0

Assets 2

14 Mar 10:19

RMANOV

v0.4.0

8391926

v0.4.0 — "Algorithmic Intelligence"

┌─────────────────────────────────────────────────────────────────┐
│  SmartKey v0.4.0 — "Algorithmic Intelligence"                   │
│                                                                 │
│  2 commits · 18 files · +2,333 lines · ~150+ tests             │
│  8 new prediction modules · PPM · BPE · Kneser-Ney · Hedge     │
└─────────────────────────────────────────────────────────────────┘

TL;DR

v0.4.0 is SmartKey's largest algorithmic leap. Eight new prediction modules land in one release: Unicode-based language detection, per-language vocabulary counters, character-level PPM compression, BPE subword tokenisation, modified Kneser-Ney smoothing, an adaptive ensemble mixer, acceptance-rate telemetry, and session-burst caching. This is the foundation for multilingual prediction without configuration.

Highlights

🌐 Instant Language Detection (`lang_detect.rs` — 216 LoC)

Unicode block classification: every character has a block (Latin, Cyrillic, Greek, Arabic, CJK…). SmartKey reads the block distribution of your last N keystrokes and infers the active language in microseconds — no ML model, no external corpus, no network call.

📊 Per-Language CVM Counters (`lang_cvm.rs` — 182 LoC)

Each detected language gets its own CVM streaming counter. As you type Bulgarian the BG counter accumulates; switch to English and the EN counter takes over. Vocabulary boosting is always language-contextual — you never get Bulgarian suggestions while typing English.

⚡ Session Burst Cache (`session_cache.rs` — 194 LoC)

Words typed this session are tracked with a burstiness weight: repeat a word three times in a row and it jumps to the front of suggestions immediately. The cache is in-memory only — fast, zero-persistence overhead.

📐 Prediction by Partial Matching (`ppm.rs` — 323 LoC)

PPM is a character-level context-compression algorithm (the same family as LZ77). SmartKey uses PPM-C: it maintains an order-5 context trie, falls back through order-4 → 3 → 2 → 1 → 0 → escape, and produces calibrated character probabilities. This is opt-in behind a feature flag (ppm).

🔤 BPE Subword Tokeniser (`bpe.rs` — 263 LoC)

Byte Pair Encoding builds a vocabulary of subword units from corpus merge rules. SmartKey's BPE tokeniser handles out-of-vocabulary words by decomposing them into known subwords — useful for technical terms, compound words, and transliteration fragments. Feature flag: bpe.

📏 Modified Kneser-Ney Smoothing (`kneser_ney.rs` — 247 LoC)

Standard trigram models assign zero probability to unseen sequences — Kneser-Ney fixes this by redistributing probability mass to contexts where a word appears in diverse positions. Rare words that appear in many different contexts get a fair share. Feature flag: kneser_ney.

🎲 Adaptive Ensemble Mixer (`hedge.rs` — 235 LoC)

HEDGE (Hedge Algorithm) is an online learning algorithm that assigns exponential weights to each prediction signal. Signals that are right more often get more weight; signals that miss get down-weighted. The mixer adapts per-session without any offline training. Feature flag: hedge.

📈 Prediction Quality Telemetry (`eval.rs` — 256 LoC)

Acceptance rate metrics: what fraction of predictions were accepted, at what rank, after how many keystrokes. The evaluator runs in a background ring buffer and feeds its signals back to the ensemble mixer.

By the Numbers

Metric	v0.3.0	v0.4.0
Files changed	—	18
Lines added	—	+2,333
Lines removed	—	-52
Net growth	—	+2,281 LoC
New modules	—	8
`ppm.rs`	—	323 LoC
`bpe.rs`	—	263 LoC
`eval.rs`	—	256 LoC
`kneser_ney.rs`	—	247 LoC
`hedge.rs`	—	235 LoC
`lang_detect.rs`	—	216 LoC
`session_cache.rs`	—	194 LoC
`lang_cvm.rs`	—	182 LoC
`ensemble.rs`	~400 LoC	~670 LoC (+270)
Tests	~100	~150+

Feature Flags

Experimental modules are gated behind Cargo feature flags — the default build is fast and stable, opt in to the research algorithms when you want them:

Flag	Module	Status
`ppm`	Prediction by Partial Matching	Experimental
`bpe`	Byte Pair Encoding tokeniser	Experimental
`kneser_ney`	Modified Kneser-Ney smoothing	Experimental
`hedge`	Adaptive ensemble mixer	Experimental
`session_cache`	Burstiness-weighted session cache	Stable

# Enable all experimental modules:
cargo build --release --features ppm,bpe,kneser_ney,hedge

Full Changelog

Features

feat(core): lang_detect.rs — Unicode block-based language detection (216 LoC)
feat(core): lang_cvm.rs — per-language CVM vocabulary counters (182 LoC)
feat(core): session_cache.rs — burstiness-weighted session word cache (194 LoC)
feat(core): ppm.rs — Prediction by Partial Matching, character-level (323 LoC) [feature: ppm]
feat(core): bpe.rs — Byte Pair Encoding subword tokeniser (263 LoC) [feature: bpe]
feat(core): eval.rs — acceptance rate metrics and prediction quality measurement (256 LoC)
feat(core): kneser_ney.rs — modified Kneser-Ney smoothing for rare words (247 LoC) [feature: kneser_ney]
feat(core): hedge.rs — adaptive ensemble mixer with exponential weights (235 LoC) [feature: hedge]
feat(core): ensemble.rs expanded — +270 LoC multi-signal blending with per-source normalisation

Fixes

fix(win): Make category registration non-fatal for HKCU install — startup no longer aborts if optional registry key is inaccessible

What's Next

Dual-buffer layout-agnostic input: type in any script on any keyboard layout (v0.5.0)
IBus replace handler for automatic layout switching
evdev scancode → char lookup tables (EN QWERTY + BG Phonetic)
Cold-start wrong-layout detection via corpus frequency comparison
Windows installer (MSI) + macOS .app bundle

Install

git clone https://github.com/RMANOV/smartkey.git
cd smartkey
git checkout v0.4.0
cargo build --release
cargo test --workspace   # ~150+ tests
# Optional: enable experimental modules
cargo build --release --features ppm,bpe,kneser_ney,hedge

Linux (IBus): maturin develop -m crates/smartkey-py/Cargo.toml --release
Windows (TSF): cargo build -p smartkey-win --release → register DLL
macOS (IMK): cargo build -p smartkey-mac --release → link from Swift

Full diff: v0.3.0...v0.4.0

Assets 2

14 Mar 10:19

RMANOV

v0.3.0

f30880d

v0.3.0 — "Performance & Prediction Quality"

┌─────────────────────────────────────────────────────────────────┐
│  SmartKey v0.3.0 — "Performance & Prediction Quality"           │
│                                                                 │
│  7 commits · 19 files · +1,071 lines · ~100 tests              │
│  Hash probe elimination · Tab fix · Non-admin install           │
└─────────────────────────────────────────────────────────────────┘

TL;DR

v0.3.0 is a focused sharpening release: redundant hash lookups eliminated, the Tab key now correctly commits text instead of ghosting, and Windows users can install SmartKey without administrator privileges. Personal vocabulary persistence lands in personal.rs, and the ensemble scorer gets a significant expansion.

Highlights

⚡ Hash Probe Elimination

Post-review optimization pass removed every redundant contains_key + get double-lookup from hot prediction paths. Each probe that survived a get_or_insert pattern is now a single operation. On a 100K corpus the CPU branch-prediction benefit is measurable.

⌨️ Tab CommitText Fix

Tab was being forwarded to the application before SmartKey's ghost text was committed. Result: the candidate disappeared silently, the cursor advanced, and the typed prefix was left dangling. Fixed: Tab now issues CommitText first, then forwards — matching user expectation universally.

🪟 Non-Admin HKCU Install (Windows)

SmartKey previously required Administrator rights to write its COM registration to HKLM. v0.3.0 introduces per-user registration via HKCU\Software\Classes — no elevation prompt, no IT-department ticket needed.

📚 Personal Vocabulary Persistence (`personal.rs`)

219-line module for per-user vocabulary tracking: words you accept get recorded, persisted to JSON, and boosted on next load. The feedback loop is now on disk, surviving restarts.

🧮 Ensemble Scorer Expansion (`ensemble.rs`)

+200 lines of scoring logic: confidence weighting per signal, per-source normalisation, and improved blending of Markov trigram + CVM frequency scores. Prediction quality measurably improves on mixed-language corpora.

🔧 Clippy Lint Cleanup

map_or(false, |x| pred(x)) → is_some_and(|x| pred(x)) across the codebase. Cleaner, faster, idiomatic.

By the Numbers

Metric	v0.2.0	v0.3.0
Files changed	—	19
Lines added	—	+1,071
Lines removed	—	-393
Net growth	—	+678 LoC
`personal.rs`	—	219 LoC (new)
`ensemble.rs`	~200 LoC	~400 LoC (+200)
Admin required (Windows)	Yes	No (HKCU)

Full Changelog

Features

feat(core): personal.rs — 219 LoC personal vocabulary persistence (JSON round-trip)
feat(core): ensemble.rs expanded — +200 LoC, improved confidence weighting and signal blending
feat(win): Non-admin HKCU COM registration — install without elevation

Fixes

fix(core): Tab CommitText bug — Tab now commits ghost text before forwarding
fix(core): Post-review optimizations — eliminate redundant hash probes and allocations
fix(core): Remove dead dependencies and dead code
fix(clippy): map_or(false, ..) → is_some_and(..) in ngram.rs

Refactoring

refactor: Merge feature/perf-optimization and feature/smartkey-v030 — clean history

What's Next

Language detection — per-script Unicode block analysis (v0.4.0)
Prediction by Partial Matching (PPM) character model
Kneser-Ney smoothing for rare-word handling
Adaptive ensemble mixer with exponential weights
Per-language CVM counters for zero-config multilingual support

Install

git clone https://github.com/RMANOV/smartkey.git
cd smartkey
git checkout v0.3.0
cargo build --release
cargo test --workspace

Linux (IBus): maturin develop -m crates/smartkey-py/Cargo.toml --release
Windows (TSF): cargo build -p smartkey-win --release → register DLL (no elevation needed)
macOS (IMK): cargo build -p smartkey-mac --release → link from Swift

Full diff: v0.2.0...v0.3.0

Assets 2

12 Mar 14:37

RMANOV

v0.2.0

917aecc

v0.2.0 — Cross-Platform

┌─────────────────────────────────────────────────────────────────┐
│  SmartKey v0.2.0 — "Cross-Platform"                             │
│                                                                 │
│  41 commits · 47 files · +9,132 lines · 106 tests              │
│  3 platforms · 4 crates · 8 benchmark groups · 0 unsafe panics  │
└─────────────────────────────────────────────────────────────────┘

   Markov trigram:  125 ns     CVM score:  151 ns
   Full predict:    137 µs     Fuzzy (1K): 505 µs

TL;DR

SmartKey grew from a Linux-only IBus engine into a cross-platform prediction engine with Windows TSF, macOS IMK adapters, fuzzy matching, CVM recency decay, and honest benchmark numbers.

Highlights

🪟 Windows TSF — Full IME Integration

Complete Text Services Framework adapter: COM DLL, class factory, TSF registration, composition lifecycle, ghost text display attributes. Key innovations:

ToUnicodeEx key mapping — resolves Shift, CapsLock, Cyrillic, and any installed keyboard layout. No more hardcoded ASCII.
Smart key claiming — OnTestKeyDown only claims keys SmartKey processes. Tab/Right/Escape only when ghost text is visible. Stops blocking other IMEs and app shortcuts.
Dead-key safe — dwFlags=4 preserves compose sequences (e.g. ^ + e → ê).

🍎 macOS IMK — C FFI + Swift

Complete C FFI surface for Swift IMKInputController:

smartkey_handle_key / smartkey_focus_lost / smartkey_reset — full event lifecycle
smartkey_load_trigram + smartkey_load_corpus_file — corpus loading (new in v0.2.0)
smartkey_save_personal / smartkey_load_personal — vocabulary persistence

🔍 Fuzzy Matching — Damerau-Levenshtein

When exact prefix fails, SmartKey falls back to fuzzy search with configurable edit distance:

Handles substitutions, insertions, deletions, and transpositions
"functon" → "function" (1 edit), "hlelo" → "hello" (1 transposition)
Fuzzy results ranked below exact matches via configurable discount factor
Benchmark: 505 µs for 1K corpus / 1 edit — fast enough for real-time

🧠 CVM Recency Decay

The CVM streaming counter now has time-aware vocabulary adaptation:

Words you stop typing decay probabilistically over rounds
decay_lambda parameter controls half-life (default: 0.1)
refresh() resets decay clock on re-encounter — frequent words survive
Benchmark: 151 ns per frequency score — nanosecond-class personal boosting

📊 Criterion Benchmark Suite — Real Numbers

8 benchmark groups, 316 lines, measuring every component in isolation:

Component	Median	Notes
Markov (no context)	6.4 ns	Hash lookup, essentially O(1)
Markov (trigram backoff)	125 ns	Katz interpolation
CVM frequency score	151 ns	Streaming cardinality
Full predict (1K corpus)	137 µs	Well under 16ms frame budget
Full predict (10K corpus)	1.74 ms	Still real-time capable
Fuzzy search (1K, 1-edit)	505 µs	Damerau-Levenshtein
Personal JSON round-trip	62.5 µs	CVM snapshot save/load

Architecture Evolution

v0.1.0:                          v0.2.0:

smartkey-core ← smartkey-py      smartkey-core (Apache 2.0)
      │              │                 │
      │         IBus (Linux)     ┌─────┼─────────┐
      └──────────────┘           │     │         │
                            smartkey-py │    smartkey-mac
                              (PyO3)   │     (C FFI)
                              Linux  smartkey-win  macOS
                                      (TSF/COM)
                                      Windows

From 2 crates to 4. The core remains Apache 2.0 — embed it anywhere.

By the Numbers

Metric	v0.1.0	v0.2.0
Platforms	1 (Linux)	3 (Linux, Windows, macOS)
Crates	2	4
Tests	38	106 (+179%)
Benchmark groups	0	8
Corpus (EN)	~3K unigrams	100K+ unigrams
Corpus (BG)	~2K unigrams	100K+ unigrams
Integration tests	0	5
CI	None	GitHub Actions

Full Changelog

Features

feat(core): Extract InputMethodCore — cross-platform key event state machine
feat(core): Fuzzy prefix matching with Damerau-Levenshtein fallback
feat(cvm): Recency decay for frequency scoring (time-aware adaptation)
feat(win): Windows TSF scaffold → COM DLL → composition → display attributes → ToUnicodeEx
feat(mac): macOS IMK scaffold → C FFI → trigram + corpus file loading
feat(py): from_config, export/import_personal in PyO3 bridge
feat(bench): Criterion benchmark suite (8 groups, 316 lines)
feat(corpus): Scale EN and BG corpora to 100K+ unigrams + tech corpus (421 terms)

Fixes

fix(win): ToUnicodeEx for layout-aware key mapping (Shift, Cyrillic)
fix(win): Smart key claiming — stops blocking other IMEs
fix(win): Dead-key composition safety (dwFlags=4)
fix(win): Honor riid in COM entry points, track live objects
fix(ibus): Graceful daemon shutdown (SIGTERM/SIGINT handlers)
fix(ibus): Corpus deduplication (prevent .json + .bin double-loading)
fix: 40+ code review findings (4 critical, 10 important, 26 low)
fix(ci): Deterministic test_personal_boost + corpus tracking

Refactoring

refactor(ibus): Delegate corpus loading to Rust (removes 31 lines of Python)
refactor: Consolidate config parsing, remove dead code

Documentation

docs(readme): Fix phantom features, mark completed roadmap items
docs(readme): Replace aspirational targets with actual benchmark numbers
docs: CI workflow + cross-platform README update

What's Next (Roadmap)

Memory-mapped binary corpus format (mmap — eliminate JSON parse at startup)
GTK4 popup panel for Alt-hold alternatives
SQLite persistence for CVM personal vocabulary
BK-tree or SymSpell for O(1) fuzzy lookup at scale
Windows installer (MSI) + macOS .app bundle
Mobile port (Android IME via JNI)

Install

git clone https://github.com/RMANOV/smartkey.git
cd smartkey
cargo build --release
cargo test --workspace   # 106 tests
cargo bench              # 8 benchmark groups

Full diff: v0.1.0...v0.2.0

Assets 2

08 Mar 13:38

RMANOV

v0.1.0

1519f05

v0.1.0 — Genesis: The Keyboard That Learns You

SmartKey v0.1.0 — Genesis

The keyboard that learns you, not the other way around.

SmartKey is a system-wide predictive keyboard engine for Linux that predicts the next word — not the next paragraph. Pure Rust core. Zero cloud. Sub-microsecond lookups. Your typing patterns never leave your machine.

What's Inside

Prediction Engine (Rust, 1,258 LoC)

A 3-stage ensemble scorer that blends three independent signals into a single ranked prediction:

Stage	Algorithm	Latency	What It Does
N-gram Trie	Aho-Corasick prefix search	< 1μs	Finds all words matching your typed prefix
Markov Chain	Katz backoff (trigram → bigram → unigram)	< 5μs	Scores candidates by context probability
CVM Counter	Streaming cardinality estimator	< 2μs	Boosts words you type frequently

final_score = α · corpus + β · markov + γ · personal
            = 0.4 · "how common" + 0.4 · "how likely here" + 0.2 · "how often YOU say this"

After a week of coding, async outranks assistant. After switching to Bulgarian, здравей surfaces automatically. No retraining. No config. Just typing.

IBus Integration (Python, 499 LoC)

Full IBus.Engine subclass with:

Ghost text — top prediction appears inline as dimmed text
Tab to accept entire word, → to accept one character, Esc to dismiss
Super+Escape kill switch — instantly disables/re-enables
Automatic context tracking (5-word sliding window)
Multi-language corpus loading (EN + BG out of the box)

Corpus Pipeline (Python + Bash, 793 LoC)

Production-grade Wikipedia-to-JSON pipeline:

download_corpus.sh — downloads Wikipedia article dumps (any language)
extract_wiki.py — streaming MediaWiki XML extractor (18 regex cleanup passes, skips redirects/stubs)
build_corpus.py — sentence-aware tokenizer with contraction + Cyrillic support, streaming n-gram counter
corpus_stats.py — quality metrics: Zipf exponent, hapax legomena, coverage curves

# Build a Bulgarian corpus from Wikipedia in one pipeline:
./download_corpus.sh --lang bg
python3 extract_wiki.py raw/bgwiki-*.xml.bz2 | python3 build_corpus.py --stdin --lang bg --min-freq 3

By the Numbers

Metric	Value
Rust LoC	1,313
Python LoC	1,193
Test suite	38 tests, 0 failures
Modules tested	5 (CVM, n-gram, Markov, prefix, ensemble)
Dependencies	Minimal (aho-corasick, rand, pyo3, serde)
Prediction latency	< 10μs target
Memory footprint	< 50MB target (EN+BG corpora)
Cloud dependency	Zero. Everything is local.

The CVM Innovation

Most keyboards use frequency tables or neural nets. SmartKey uses the Chakraborty–Vinodchandran–Meel (CVM) algorithm — a streaming cardinality estimator that naturally provides:

Vocabulary decay — words you stop typing get probabilistically evicted
Automatic adaptation — no timestamps, no TTLs, no explicit forgetting
O(log N) memory — tracks your entire vocabulary in kilobytes
Zero-config language detection — per-language CVM counters detect script switches mid-sentence

cvm.frequency_score("function")  // → 64.0 (you're a programmer)
cvm.frequency_score("synergy")   // → 0.0  (you stopped writing corporate email)

Dual License

smartkey-core (Rust library) → Apache 2.0 — embed in your mobile keyboard, IDE plugin, or WASM app
Application layer (IBus, corpus, PyO3) → GPL 3.0 — derivatives stay open

Quick Start

git clone https://github.com/RMANOV/smartkey.git
cd smartkey
cargo build --release
cargo test -p smartkey-core    # 38 tests, 0 failures

# Build PyO3 bridge
pip install maturin
maturin develop -m crates/smartkey-py/Cargo.toml --release

# Register with IBus
cp ibus/smartkey.xml ~/.local/share/ibus/component/
ibus restart

What's Next (Roadmap)

Criterion benchmark suite with latency targets
Memory-mapped binary corpus (mmap, no JSON parse at startup)
GTK4 popup panel for Alt-hold alternatives
SQLite persistence for CVM personal vocabulary
Pre-built EN + BG corpus packages
Typo correction via Levenshtein distance (Damerau-Levenshtein, v0.2.0)
Android IME port via JNI

Built with Rust, Python, and obsessive attention to keystroke latency.

Assets 2

Releases: RMANOV/smartkey

v0.6.0 — "The Brain"

TL;DR

Highlights

🧠 Proactive Master Algorithm (master_loop.rs — 750 LoC)

📊 Typing Regime Detection (typing_regime.rs — 200 LoC)

🔠 CapsEngine (caps.rs — 100 LoC)

😤 Frustration Detector (frustration.rs — 200 LoC)

🔧 Tech Vocabulary (tech_vocab.rs — 150 LoC)

🧬 Per-Language Model Bank (lang_model.rs — 100 LoC)

🔄 Correction Memory (correction_memory.rs — 150 LoC)

📡 FFI Protocol (ffi_protocol.rs — 80 LoC)

👤 Light Profile (light_profile.rs — 100 LoC)

🎮 Playground Simulation Harness (smartkey-playground — new crate)

Architecture

Preedit-First Dual Buffer

Space Auto-Commit + Syntactic Casing

By the Numbers

Full Changelog

Features

Fixes

Chores

Known Issues / Tech Debt

Uh oh!

v0.5.0 — "Never Switch Again"

TL;DR

Highlights

🔄 Dual-Buffer Layout-Agnostic Input (dual_buffer.rs — 399 LoC)

⌨️ evdev Keymap Tables (keymap.rs — 323 LoC)

🔁 IBus Replace Handler (ibus/)

🌡️ Cold-Start Wrong-Layout Detection

🛠️ ReplaceWord Across All Crates

🦀 Rust 1.94 Clippy Compliance

By the Numbers

Known Issues / Tech Debt

Full Changelog

Features

Fixes

What's Next (v0.6.0 and beyond)

Install

Uh oh!

v0.4.0 — "Algorithmic Intelligence"

TL;DR

Highlights

🌐 Instant Language Detection (lang_detect.rs — 216 LoC)

📊 Per-Language CVM Counters (lang_cvm.rs — 182 LoC)

⚡ Session Burst Cache (session_cache.rs — 194 LoC)

📐 Prediction by Partial Matching (ppm.rs — 323 LoC)

🔤 BPE Subword Tokeniser (bpe.rs — 263 LoC)

📏 Modified Kneser-Ney Smoothing (kneser_ney.rs — 247 LoC)

🎲 Adaptive Ensemble Mixer (hedge.rs — 235 LoC)

📈 Prediction Quality Telemetry (eval.rs — 256 LoC)

By the Numbers

Feature Flags

Full Changelog

Features

Fixes

What's Next

Install

Uh oh!

v0.3.0 — "Performance & Prediction Quality"

TL;DR

Highlights

⚡ Hash Probe Elimination

⌨️ Tab CommitText Fix

🪟 Non-Admin HKCU Install (Windows)

📚 Personal Vocabulary Persistence (personal.rs)

🧮 Ensemble Scorer Expansion (ensemble.rs)

🔧 Clippy Lint Cleanup

By the Numbers

Full Changelog

Features

Fixes

Refactoring

What's Next

Install

Uh oh!

v0.2.0 — Cross-Platform

TL;DR

Highlights

🧠 Proactive Master Algorithm (`master_loop.rs` — 750 LoC)

📊 Typing Regime Detection (`typing_regime.rs` — 200 LoC)

🔠 CapsEngine (`caps.rs` — 100 LoC)

😤 Frustration Detector (`frustration.rs` — 200 LoC)

🔧 Tech Vocabulary (`tech_vocab.rs` — 150 LoC)

🧬 Per-Language Model Bank (`lang_model.rs` — 100 LoC)

🔄 Correction Memory (`correction_memory.rs` — 150 LoC)

📡 FFI Protocol (`ffi_protocol.rs` — 80 LoC)

👤 Light Profile (`light_profile.rs` — 100 LoC)

🎮 Playground Simulation Harness (`smartkey-playground` — new crate)

🔄 Dual-Buffer Layout-Agnostic Input (`dual_buffer.rs` — 399 LoC)

⌨️ evdev Keymap Tables (`keymap.rs` — 323 LoC)

🔁 IBus Replace Handler (`ibus/`)

🌐 Instant Language Detection (`lang_detect.rs` — 216 LoC)

📊 Per-Language CVM Counters (`lang_cvm.rs` — 182 LoC)

⚡ Session Burst Cache (`session_cache.rs` — 194 LoC)

📐 Prediction by Partial Matching (`ppm.rs` — 323 LoC)

🔤 BPE Subword Tokeniser (`bpe.rs` — 263 LoC)

📏 Modified Kneser-Ney Smoothing (`kneser_ney.rs` — 247 LoC)

🎲 Adaptive Ensemble Mixer (`hedge.rs` — 235 LoC)

📈 Prediction Quality Telemetry (`eval.rs` — 256 LoC)

📚 Personal Vocabulary Persistence (`personal.rs`)

🧮 Ensemble Scorer Expansion (`ensemble.rs`)