Releases: RMANOV/smartkey
v0.6.0 — "The Brain"
┌─────────────────────────────────────────────────────────────────┐
│ SmartKey v0.6.0 — "The Brain" │
│ │
│ 22 commits · 45 files · +6,474 lines · 311 tests │
│ Master Algorithm · Regime Detection · CapsEngine · Frustration │
│ CorrectionMemory · TechVocab · LangModel · FFI Protocol │
└─────────────────────────────────────────────────────────────────┘
TL;DR
v0.5.0 taught SmartKey to detect your keyboard layout. v0.6.0 teaches it to understand you. A new proactive Master Algorithm observes your typing patterns, classifies your current regime (fast coding? deliberate prose? language switching?), tracks frustration signals, remembers your corrections, applies smart capitalisation — and adjusts every prediction threshold in real-time. The dual-buffer engine from v0.5.0 is now wrapped in a 4-phase state machine that anticipates, tracks, corrects, and learns. 311 tests verify it all.
Highlights
🧠 Proactive Master Algorithm (master_loop.rs — 750 LoC)
The centrepiece of v0.6.0. A decorator-pattern orchestrator wrapping InputMethodCore with a 4-phase state machine:
ANTICIPATING → TRACKING → CORRECTING → LEARNING
│ │ │ │
▼ ▼ ▼ ▼
Read context Follow word Detect bad Update profile
Set lang prior Build ghost prediction Save corrections
Bias ensemble Monitor WPM Suppress it Adjust thresholds
Key innovation: Hints injection. Before each prediction, MasterLoop computes a Hints struct (lang_prior, suppress_ghost, confidence_boost) and passes it down to the ensemble. This means predictions adapt to context before the user finishes typing — not after.
📊 Typing Regime Detection (typing_regime.rs — 200 LoC)
Inspired by market regime detection in algorithmic trading. Five EMA signals (WPM, word length, out-of-vocabulary rate, dual-buffer flip rate, language ratio) are tracked per word and classified into 5 regimes:
| Regime | Trigger | Effect |
|---|---|---|
FastCoding |
High WPM + OOV + short words | TechVocab δ weight ↑, ghost threshold ↓ |
DeliberateProse |
Low WPM + long words | Markov β weight ↑, longer ghost text |
MixedLanguage |
High flip rate | Dual-buffer stays unlocked longer |
NewTopic |
OOV spike | Reset session cache, broaden predictions |
LanguageSwitch |
Sharp lang ratio flip | Lock to new language faster |
🔠 CapsEngine (caps.rs — 100 LoC)
The trie stores all words lowercase. CapsEngine detects the caps pattern from what you've typed so far and applies it to predictions:
Normal— first-letter capitalisation at sentence startAllCaps→HELLOCamelCase→camelCasePascalCase→PascalCase
Proper nouns are capitalised regardless. This is why the CI tests needed eq_ignore_ascii_case — predictions are no longer always lowercase.
😤 Frustration Detector (frustration.rs — 200 LoC)
Real-time keystroke pattern monitor detecting 4 frustration signals:
- REJECT — Backspace within 500ms of Tab acceptance → you didn't want that prediction
- RAPID_DELETE — 3+ Backspace in <1 second → something went wrong
- RETYPE — Delete all + retype in different script → wrong language was locked
- ABANDON — Escape + continued manual typing → user gave up on predictions
Each signal carries severity [0.0, 1.0]. When frustration spikes, ghost text is suppressed for the next N words. The Master Algorithm's LEARNING phase uses these signals to adjust LightProfile.confidence_floor.
🔧 Tech Vocabulary (tech_vocab.rs — 150 LoC)
300+ hardcoded technical terms (programming keywords, framework names, CLI commands) with frequency-based scoring. Active as the δ weight in the ensemble only during FastCoding regime — prevents function from appearing while writing Bulgarian prose.
🧬 Per-Language Model Bank (lang_model.rs — 100 LoC)
Each detected language gets its own isolated model bank (trie + Markov + PPM + Kneser-Ney). Prevents cross-language contamination: Bulgarian Markov chains don't bleed into English predictions and vice versa.
🔄 Correction Memory (correction_memory.rs — 150 LoC)
When the user corrects the same prediction 3+ times in the same context, the bad prediction is suppressed and the correction is shown instead. Uses 2-word context hashing and LRU eviction (max 500 entries). Serialises to PersonalProfile v4 for persistence across sessions.
📡 FFI Protocol (ffi_protocol.rs — 80 LoC)
Single source of truth for all FFI serialisation: ReplaceWord payloads use \x1F (Unit Separator) between replace_len and text; ShowComposing uses \x00 between typed and ghost portions. Shared by PyO3, C FFI, and test harnesses — eliminates parser mismatches.
👤 Light Profile (light_profile.rs — 100 LoC)
Adaptive confidence tuning based on rolling accept/reject ratio:
- High acceptance → lower confidence floor → more ghost suggestions shown
- Low acceptance → higher floor → fewer, higher-quality suggestions
- Formula:
max(0.15, 0.50 - 0.35 × accept_rate) - Auto-suppresses ghost text for 2 words after a rejection
🎮 Playground Simulation Harness (smartkey-playground — new crate)
Closed-loop simulator driving InputMethodCore through target text, measuring keystroke efficiency, prediction accuracy, and ghost acceptance rates. Includes preset scenarios and HTML report generation.
Architecture
v0.6.0 adds a new orchestration layer above the existing prediction stack:
┌──────────────────────────────────────────────────────┐
│ MasterLoop (new) │
│ ┌─ RegimeDetector ── FrustrationDetector │
│ ├─ LightProfile ──── CorrectionMemory │
│ └─ Hints → InputMethodCore │
│ ┌─ DualBuffer (v0.5.0) │
│ ├─ CapsEngine (new) │
│ └─ Ensemble │
│ ├─ α Corpus frequency │
│ ├─ β Markov chain │
│ ├─ γ CVM counter │
│ └─ δ TechVocab (new, regime-gated) │
└──────────────────────────────────────────────────────┘
Preedit-First Dual Buffer
input.rs refactored from 1,168 → 1,824 LoC. The dual buffer now operates in preedit mode: text is held in a composition buffer before commit, eliminating character doubling that plagued v0.5.0. Ghost text comparison is now Unicode-safe.
Space Auto-Commit + Syntactic Casing
Pressing Space now auto-commits the current word (no explicit Tab needed for unambiguous predictions). After commit, CapsEngine applies casing rules: sentence-start capitalisation, proper noun detection, regime-based transforms.
By the Numbers
| Metric | v0.5.0 | v0.6.0 |
|---|---|---|
| Files changed | — | 45 |
| Lines added | — | +6,474 |
| Lines removed | — | -358 |
| Net growth | — | +6,116 LoC |
| New modules | — | 10 |
| New crate | — | smartkey-playground |
master_loop.rs |
— | 750 LoC (new) |
typing_regime.rs |
— | 200 LoC (new) |
frustration.rs |
— | 200 LoC (new) |
tech_vocab.rs |
— | 150 LoC (new) |
correction_memory.rs |
— | 150 LoC (new) |
caps.rs |
— | 100 LoC (new) |
lang_model.rs |
— | 100 LoC (new) |
light_profile.rs |
— | 100 LoC (new) |
ffi_protocol.rs |
— | 80 LoC (new) |
ensemble.rs |
~670 LoC | 1,394 LoC (+108%) |
input.rs |
~1,168 LoC | 1,824 LoC (+56%) |
virtual_typer.rs |
— | 881 LoC (new test) |
| Tests | 207 | 311 (+50%) |
Full Changelog
Features
feat(core):master_loop.rs— proactive 4-phase Master Algorithm with Hints injection (750 LoC)feat(core):typing_regime.rs— 5-regime detection via EMA signals (200 LoC)feat(core):caps.rs— CapsEngine: smart capitalisation from typed prefix (100 LoC)feat(core):tech_vocab.rs— 300+ technical terms as ensemble δ signal (150 LoC)feat(core):frustration.rs— 4-signal frustration detector (200 LoC)feat(core):lang_model.rs— per-language model isolation bank (100 LoC)feat(core):correction_memory.rs— repeated correction suppression with LRU (150 LoC)feat(core):light_profile.rs— adaptive confidence floor tuning (100 LoC)feat(core):ffi_protocol.rs— unified FFI serialisation contract (80 LoC)feat(core): preedit-first dual buffer — eliminate character doublingfeat(core): aggressive space auto-commit and syntactic casing restorationfeat(core): per-language models + smart caps for 2× prediction accuracyfeat(core): confidence hardening + eval histogramfeat(playground): closed-loop simulation harness for prediction quality eval
Fixes
fix(core): bug hunt v15 — 10 fixes + 4 simplifications in Master Algorithmfix(core): wire flip detection, clear momentum on reset, harden ghost textfix(core): Unicode-safe ghost text comparison, remove deadsentence_startfieldfix(core): resolve compile errors and ghost text regressionfix(core): case-insensitive assertions in integration + virtual_typer testsfix: ReplaceWord parser mismatch, macOS handler, overflow guardsfix: confidence model, overflow guards, code quality improvementsfix: parallel surgery — preedit lifecycle, language detection, prediction quality
Chores
chore(core): remove appliedcaps.patch— history preserved in git blamerefactor(core): simplify hypothesis phase checks and reduce allocationsci: addsmartkey-playgroundto test and clippy jobs
Known Issues / Tech Debt
- Confidence single-candidate weakness: Softmax normalisation gives confidence=1.0 to a single low-score prediction. Mitigated by
ghost_text_min_scorethreshold, but a pro...
v0.5.0 — "Never Switch Again"
┌─────────────────────────────────────────────────────────────────┐
│ SmartKey v0.5.0 — "Never Switch Again" │
│ │
│ 12 commits · 27 files · +1,985 lines · 207 tests │
│ Dual-buffer · evdev keymaps · IBus replace · layout-agnostic │
└─────────────────────────────────────────────────────────────────┘
TL;DR
Type zdravej on a QWERTY keyboard. SmartKey intercepts the hardware scancodes, maintains a parallel Bulgarian Phonetic interpretation in a second buffer, compares corpus frequencies, decides you meant Bulgarian, and outputs здравей — without you pressing Alt+Shift, Ctrl+Space, or touching any language-switch shortcut. That is v0.5.0 in one sentence.
Highlights
🔄 Dual-Buffer Layout-Agnostic Input (dual_buffer.rs — 399 LoC)
The centrepiece of this release. Architecture:
Hardware key press
│
▼
evdev scancode
(platform-neutral)
│
┌────┴────┐
▼ ▼
EN QWERTY BG Phonetic
buffer buffer
│ │
└────┬────┘
│
corpus frequency
comparison
│
confidence ≥ 0.7?
│
┌────┴────────────────┐
▼ ▼
lock + output keep accumulating
ReplaceWord action both buffers
Key design decisions:
- Confidence threshold 0.7 — SmartKey doesn't guess until it's sure. Below 0.7 both buffers run in parallel, above 0.7 the winning language is locked and
ReplaceWordis emitted. - Flip detection — if you suddenly type characters that only exist in the non-locked language, SmartKey unlocks, reassesses, and can flip. There's no permanent lock-in.
- ReplaceWord action — the existing
Actionenum gains a new variant. All downstream crates (win, py, ibus) handle it.
⌨️ evdev Keymap Tables (keymap.rs — 323 LoC)
Previous builds used XKB key codes in lookup tables — which are already layout-remapped. This made dual-buffer impossible (you can't detect layout from remapped codes). v0.5.0 fixes this at the root: lookup tables now use raw evdev scancodes (Linux kernel constants), which are hardware positions, not characters.
Two complete lookup tables:
EN_QWERTY_MAP: evdev scancode → ASCII character (+ Shift variant)BG_PHONETIC_MAP: evdev scancode → Cyrillic character (+ Shift variant), following the Bulgarian Phonetic standard layout
🔁 IBus Replace Handler (ibus/)
The IBus adapter gains a replace action handler: when ReplaceWord arrives, it erases the N characters in the composition buffer and commits the corrected text. Bug fixes in this path:
- Off-by-one in backspace count (was deleting one extra character)
- Wrong CapsLock bit in key event synthesised for backspace
- Consumed-key event was being committed as text
🌡️ Cold-Start Wrong-Layout Detection
On session start SmartKey has no typed history — the confidence estimator starts blind. v0.5.0 adds a frequency-based cold-start heuristic: compare the character frequency distribution of the current input prefix against expected EN and BG frequencies. Whichever distribution is less surprising wins the initial guess.
Prior approach used raw character counts, which failed on short prefixes. Frequency-ratio comparison is stable at 2–3 characters.
🛠️ ReplaceWord Across All Crates
The new Action::ReplaceWord(String) variant is handled in:
smartkey-core— emitted byDualBuffer::process_keysmartkey-py(PyO3 bridge) — exposed as("replace", text)tuple in Python APIsmartkey-win(TSF) —Key::RawCodearm added toshould_claimmatch; TSF composition handles replacesmartkey-ibus— replace handler implemented with corrected backspace synthesis
🦀 Rust 1.94 Clippy Compliance
New lints in Rust 1.94:
is_multiple_of— replaces manual% n == 0patternsunnecessary_unwrap— replacesif x.is_some() { x.unwrap() }withif let Some(v) = x- CI now runs on Rust stable with
-D warningsenforced
By the Numbers
| Metric | v0.4.0 | v0.5.0 |
|---|---|---|
| Files changed | — | 27 |
| Lines added | — | +1,985 |
| Lines removed | — | -152 |
| Net growth | — | +1,833 LoC |
dual_buffer.rs |
— | 399 LoC (new) |
keymap.rs |
— | 323 LoC (new) |
| Tests | ~150 | 207 (+38%) |
| Language switch shortcuts needed | Yes | No |
| Rust version compliance | 1.8x | 1.94 |
| CI Node.js | actions/checkout v4 | v5 (Node 24) |
Known Issues / Tech Debt
Being honest about what v0.5.0 does not yet solve:
- macOS: Dual-buffer is implemented in
smartkey-corebut the macOS IMK adapter does not yet plumbReplaceWordthrough to the Swift layer. macOS gets prediction quality improvements from v0.4.0 but not layout-agnostic input. - Windows:
Key::RawCodearm added; however the evdev scancode concept is Linux-specific. Windows uses virtual key codes, so thekeymap.rstables don't apply directly. The Windows path uses the existingToUnicodeExapproach — a unified scancode abstraction across platforms is planned. - Performance:
dual_buffer.rsis O(n) in prefix length per keypress. For normal typing (< 50 chars) this is imperceptible. Long pastes trigger one full re-evaluation. - Confidence tuning: The 0.7 threshold is hardcoded. A config parameter is planned.
- Single Cyrillic keyboard layout: Only Bulgarian Phonetic is in
keymap.rs. Russian Phonetic, Serbian Cyrillic, Ukrainian, etc. are not yet represented.
Full Changelog
Features
feat(core):dual_buffer.rs— DualBuffer struct, confidence locking, flip detection,ReplaceWordemission (399 LoC)feat(core):keymap.rs— evdev scancode lookup tables for EN QWERTY + BG Phonetic (323 LoC)feat(core):Action::ReplaceWord(String)variant — handled in all downstream cratesfeat(core): instant language detection + language-aware scoring (v0.4.1backport)
Fixes
fix(core): Use evdev keycodes instead of XKB in keymap lookup tables — enables true layout-agnostic detectionfix(core): Allow wrong-layout detection at cold start — frequency-based comparison replaces broken count-based heuristicfix(core): Replace broken count-based layout detection with frequency-ratio comparisonfix(core): HandleAction::ReplaceWordin all downstream cratesfix(core): v0.4.1 post-review cleanup — all clippy warnings resolvedfix(ibus): Add replace handler — erases composition buffer and commits corrected textfix(ibus): Fix replace action — off-by-one backspace count, wrong CapsLock bit, consumed-key commitfix(win): AddKey::RawCodearm to TSFshould_claimmatchfix(ci): Resolve Rust 1.94 clippy lints (is_multiple_of,unnecessary_unwrap)chore(ci): Bumpactions/checkoutv4 → v5 (Node.js 24)
What's Next (v0.6.0 and beyond)
- macOS: plumb
ReplaceWordthrough to IMK Swift layer - Windows: unified scancode abstraction (virtual key → evdev mapping)
- Config file: expose
confidence_threshold,flip_sensitivity, feature flags - Additional Cyrillic layouts: Russian Phonetic, Ukrainian, Serbian
- GTK4 popup panel — Alt-hold to see full candidate list
- Memory-mapped binary corpus format (eliminate JSON parse at startup)
- Windows MSI installer + macOS .app bundle
- Mobile: Android IME via JNI
Install
git clone https://github.com/RMANOV/smartkey.git
cd smartkey
cargo build --release
cargo test --workspace # 207 testsLinux (IBus — with layout-agnostic input):
maturin develop -m crates/smartkey-py/Cargo.toml --release
# Start typing — SmartKey detects your intended language automaticallyWindows (TSF): cargo build -p smartkey-win --release → register DLL (HKCU, no elevation)
macOS (IMK): cargo build -p smartkey-mac --release → link from Swift
Full diff: v0.4.0...v0.5.0
v0.4.0 — "Algorithmic Intelligence"
┌─────────────────────────────────────────────────────────────────┐
│ SmartKey v0.4.0 — "Algorithmic Intelligence" │
│ │
│ 2 commits · 18 files · +2,333 lines · ~150+ tests │
│ 8 new prediction modules · PPM · BPE · Kneser-Ney · Hedge │
└─────────────────────────────────────────────────────────────────┘
TL;DR
v0.4.0 is SmartKey's largest algorithmic leap. Eight new prediction modules land in one release: Unicode-based language detection, per-language vocabulary counters, character-level PPM compression, BPE subword tokenisation, modified Kneser-Ney smoothing, an adaptive ensemble mixer, acceptance-rate telemetry, and session-burst caching. This is the foundation for multilingual prediction without configuration.
Highlights
🌐 Instant Language Detection (lang_detect.rs — 216 LoC)
Unicode block classification: every character has a block (Latin, Cyrillic, Greek, Arabic, CJK…). SmartKey reads the block distribution of your last N keystrokes and infers the active language in microseconds — no ML model, no external corpus, no network call.
📊 Per-Language CVM Counters (lang_cvm.rs — 182 LoC)
Each detected language gets its own CVM streaming counter. As you type Bulgarian the BG counter accumulates; switch to English and the EN counter takes over. Vocabulary boosting is always language-contextual — you never get Bulgarian suggestions while typing English.
⚡ Session Burst Cache (session_cache.rs — 194 LoC)
Words typed this session are tracked with a burstiness weight: repeat a word three times in a row and it jumps to the front of suggestions immediately. The cache is in-memory only — fast, zero-persistence overhead.
📐 Prediction by Partial Matching (ppm.rs — 323 LoC)
PPM is a character-level context-compression algorithm (the same family as LZ77). SmartKey uses PPM-C: it maintains an order-5 context trie, falls back through order-4 → 3 → 2 → 1 → 0 → escape, and produces calibrated character probabilities. This is opt-in behind a feature flag (ppm).
🔤 BPE Subword Tokeniser (bpe.rs — 263 LoC)
Byte Pair Encoding builds a vocabulary of subword units from corpus merge rules. SmartKey's BPE tokeniser handles out-of-vocabulary words by decomposing them into known subwords — useful for technical terms, compound words, and transliteration fragments. Feature flag: bpe.
📏 Modified Kneser-Ney Smoothing (kneser_ney.rs — 247 LoC)
Standard trigram models assign zero probability to unseen sequences — Kneser-Ney fixes this by redistributing probability mass to contexts where a word appears in diverse positions. Rare words that appear in many different contexts get a fair share. Feature flag: kneser_ney.
🎲 Adaptive Ensemble Mixer (hedge.rs — 235 LoC)
HEDGE (Hedge Algorithm) is an online learning algorithm that assigns exponential weights to each prediction signal. Signals that are right more often get more weight; signals that miss get down-weighted. The mixer adapts per-session without any offline training. Feature flag: hedge.
📈 Prediction Quality Telemetry (eval.rs — 256 LoC)
Acceptance rate metrics: what fraction of predictions were accepted, at what rank, after how many keystrokes. The evaluator runs in a background ring buffer and feeds its signals back to the ensemble mixer.
By the Numbers
| Metric | v0.3.0 | v0.4.0 |
|---|---|---|
| Files changed | — | 18 |
| Lines added | — | +2,333 |
| Lines removed | — | -52 |
| Net growth | — | +2,281 LoC |
| New modules | — | 8 |
ppm.rs |
— | 323 LoC |
bpe.rs |
— | 263 LoC |
eval.rs |
— | 256 LoC |
kneser_ney.rs |
— | 247 LoC |
hedge.rs |
— | 235 LoC |
lang_detect.rs |
— | 216 LoC |
session_cache.rs |
— | 194 LoC |
lang_cvm.rs |
— | 182 LoC |
ensemble.rs |
~400 LoC | ~670 LoC (+270) |
| Tests | ~100 | ~150+ |
Feature Flags
Experimental modules are gated behind Cargo feature flags — the default build is fast and stable, opt in to the research algorithms when you want them:
| Flag | Module | Status |
|---|---|---|
ppm |
Prediction by Partial Matching | Experimental |
bpe |
Byte Pair Encoding tokeniser | Experimental |
kneser_ney |
Modified Kneser-Ney smoothing | Experimental |
hedge |
Adaptive ensemble mixer | Experimental |
session_cache |
Burstiness-weighted session cache | Stable |
# Enable all experimental modules:
cargo build --release --features ppm,bpe,kneser_ney,hedgeFull Changelog
Features
feat(core):lang_detect.rs— Unicode block-based language detection (216 LoC)feat(core):lang_cvm.rs— per-language CVM vocabulary counters (182 LoC)feat(core):session_cache.rs— burstiness-weighted session word cache (194 LoC)feat(core):ppm.rs— Prediction by Partial Matching, character-level (323 LoC) [feature:ppm]feat(core):bpe.rs— Byte Pair Encoding subword tokeniser (263 LoC) [feature:bpe]feat(core):eval.rs— acceptance rate metrics and prediction quality measurement (256 LoC)feat(core):kneser_ney.rs— modified Kneser-Ney smoothing for rare words (247 LoC) [feature:kneser_ney]feat(core):hedge.rs— adaptive ensemble mixer with exponential weights (235 LoC) [feature:hedge]feat(core):ensemble.rsexpanded — +270 LoC multi-signal blending with per-source normalisation
Fixes
fix(win): Make category registration non-fatal for HKCU install — startup no longer aborts if optional registry key is inaccessible
What's Next
- Dual-buffer layout-agnostic input: type in any script on any keyboard layout (v0.5.0)
- IBus replace handler for automatic layout switching
- evdev scancode → char lookup tables (EN QWERTY + BG Phonetic)
- Cold-start wrong-layout detection via corpus frequency comparison
- Windows installer (MSI) + macOS .app bundle
Install
git clone https://github.com/RMANOV/smartkey.git
cd smartkey
git checkout v0.4.0
cargo build --release
cargo test --workspace # ~150+ tests
# Optional: enable experimental modules
cargo build --release --features ppm,bpe,kneser_ney,hedgeLinux (IBus): maturin develop -m crates/smartkey-py/Cargo.toml --release
Windows (TSF): cargo build -p smartkey-win --release → register DLL
macOS (IMK): cargo build -p smartkey-mac --release → link from Swift
Full diff: v0.3.0...v0.4.0
v0.3.0 — "Performance & Prediction Quality"
┌─────────────────────────────────────────────────────────────────┐
│ SmartKey v0.3.0 — "Performance & Prediction Quality" │
│ │
│ 7 commits · 19 files · +1,071 lines · ~100 tests │
│ Hash probe elimination · Tab fix · Non-admin install │
└─────────────────────────────────────────────────────────────────┘
TL;DR
v0.3.0 is a focused sharpening release: redundant hash lookups eliminated, the Tab key now correctly commits text instead of ghosting, and Windows users can install SmartKey without administrator privileges. Personal vocabulary persistence lands in personal.rs, and the ensemble scorer gets a significant expansion.
Highlights
⚡ Hash Probe Elimination
Post-review optimization pass removed every redundant contains_key + get double-lookup from hot prediction paths. Each probe that survived a get_or_insert pattern is now a single operation. On a 100K corpus the CPU branch-prediction benefit is measurable.
⌨️ Tab CommitText Fix
Tab was being forwarded to the application before SmartKey's ghost text was committed. Result: the candidate disappeared silently, the cursor advanced, and the typed prefix was left dangling. Fixed: Tab now issues CommitText first, then forwards — matching user expectation universally.
🪟 Non-Admin HKCU Install (Windows)
SmartKey previously required Administrator rights to write its COM registration to HKLM. v0.3.0 introduces per-user registration via HKCU\Software\Classes — no elevation prompt, no IT-department ticket needed.
📚 Personal Vocabulary Persistence (personal.rs)
219-line module for per-user vocabulary tracking: words you accept get recorded, persisted to JSON, and boosted on next load. The feedback loop is now on disk, surviving restarts.
🧮 Ensemble Scorer Expansion (ensemble.rs)
+200 lines of scoring logic: confidence weighting per signal, per-source normalisation, and improved blending of Markov trigram + CVM frequency scores. Prediction quality measurably improves on mixed-language corpora.
🔧 Clippy Lint Cleanup
map_or(false, |x| pred(x)) → is_some_and(|x| pred(x)) across the codebase. Cleaner, faster, idiomatic.
By the Numbers
| Metric | v0.2.0 | v0.3.0 |
|---|---|---|
| Files changed | — | 19 |
| Lines added | — | +1,071 |
| Lines removed | — | -393 |
| Net growth | — | +678 LoC |
personal.rs |
— | 219 LoC (new) |
ensemble.rs |
~200 LoC | ~400 LoC (+200) |
| Admin required (Windows) | Yes | No (HKCU) |
Full Changelog
Features
feat(core):personal.rs— 219 LoC personal vocabulary persistence (JSON round-trip)feat(core):ensemble.rsexpanded — +200 LoC, improved confidence weighting and signal blendingfeat(win): Non-admin HKCU COM registration — install without elevation
Fixes
fix(core): Tab CommitText bug — Tab now commits ghost text before forwardingfix(core): Post-review optimizations — eliminate redundant hash probes and allocationsfix(core): Remove dead dependencies and dead codefix(clippy):map_or(false, ..)→is_some_and(..)inngram.rs
Refactoring
refactor: Mergefeature/perf-optimizationandfeature/smartkey-v030— clean history
What's Next
- Language detection — per-script Unicode block analysis (v0.4.0)
- Prediction by Partial Matching (PPM) character model
- Kneser-Ney smoothing for rare-word handling
- Adaptive ensemble mixer with exponential weights
- Per-language CVM counters for zero-config multilingual support
Install
git clone https://github.com/RMANOV/smartkey.git
cd smartkey
git checkout v0.3.0
cargo build --release
cargo test --workspaceLinux (IBus): maturin develop -m crates/smartkey-py/Cargo.toml --release
Windows (TSF): cargo build -p smartkey-win --release → register DLL (no elevation needed)
macOS (IMK): cargo build -p smartkey-mac --release → link from Swift
Full diff: v0.2.0...v0.3.0
v0.2.0 — Cross-Platform
┌─────────────────────────────────────────────────────────────────┐
│ SmartKey v0.2.0 — "Cross-Platform" │
│ │
│ 41 commits · 47 files · +9,132 lines · 106 tests │
│ 3 platforms · 4 crates · 8 benchmark groups · 0 unsafe panics │
└─────────────────────────────────────────────────────────────────┘
Markov trigram: 125 ns CVM score: 151 ns
Full predict: 137 µs Fuzzy (1K): 505 µs
TL;DR
SmartKey grew from a Linux-only IBus engine into a cross-platform prediction engine with Windows TSF, macOS IMK adapters, fuzzy matching, CVM recency decay, and honest benchmark numbers.
Highlights
🪟 Windows TSF — Full IME Integration
Complete Text Services Framework adapter: COM DLL, class factory, TSF registration, composition lifecycle, ghost text display attributes. Key innovations:
ToUnicodeExkey mapping — resolves Shift, CapsLock, Cyrillic, and any installed keyboard layout. No more hardcoded ASCII.- Smart key claiming —
OnTestKeyDownonly claims keys SmartKey processes. Tab/Right/Escape only when ghost text is visible. Stops blocking other IMEs and app shortcuts. - Dead-key safe —
dwFlags=4preserves compose sequences (e.g.^+e→ê).
🍎 macOS IMK — C FFI + Swift
Complete C FFI surface for Swift IMKInputController:
smartkey_handle_key/smartkey_focus_lost/smartkey_reset— full event lifecyclesmartkey_load_trigram+smartkey_load_corpus_file— corpus loading (new in v0.2.0)smartkey_save_personal/smartkey_load_personal— vocabulary persistence
🔍 Fuzzy Matching — Damerau-Levenshtein
When exact prefix fails, SmartKey falls back to fuzzy search with configurable edit distance:
- Handles substitutions, insertions, deletions, and transpositions
"functon"→"function"(1 edit),"hlelo"→"hello"(1 transposition)- Fuzzy results ranked below exact matches via configurable discount factor
- Benchmark: 505 µs for 1K corpus / 1 edit — fast enough for real-time
🧠 CVM Recency Decay
The CVM streaming counter now has time-aware vocabulary adaptation:
- Words you stop typing decay probabilistically over rounds
decay_lambdaparameter controls half-life (default: 0.1)refresh()resets decay clock on re-encounter — frequent words survive- Benchmark: 151 ns per frequency score — nanosecond-class personal boosting
📊 Criterion Benchmark Suite — Real Numbers
8 benchmark groups, 316 lines, measuring every component in isolation:
| Component | Median | Notes |
|---|---|---|
| Markov (no context) | 6.4 ns | Hash lookup, essentially O(1) |
| Markov (trigram backoff) | 125 ns | Katz interpolation |
| CVM frequency score | 151 ns | Streaming cardinality |
| Full predict (1K corpus) | 137 µs | Well under 16ms frame budget |
| Full predict (10K corpus) | 1.74 ms | Still real-time capable |
| Fuzzy search (1K, 1-edit) | 505 µs | Damerau-Levenshtein |
| Personal JSON round-trip | 62.5 µs | CVM snapshot save/load |
Architecture Evolution
v0.1.0: v0.2.0:
smartkey-core ← smartkey-py smartkey-core (Apache 2.0)
│ │ │
│ IBus (Linux) ┌─────┼─────────┐
└──────────────┘ │ │ │
smartkey-py │ smartkey-mac
(PyO3) │ (C FFI)
Linux smartkey-win macOS
(TSF/COM)
Windows
From 2 crates to 4. The core remains Apache 2.0 — embed it anywhere.
By the Numbers
| Metric | v0.1.0 | v0.2.0 |
|---|---|---|
| Platforms | 1 (Linux) | 3 (Linux, Windows, macOS) |
| Crates | 2 | 4 |
| Tests | 38 | 106 (+179%) |
| Benchmark groups | 0 | 8 |
| Corpus (EN) | ~3K unigrams | 100K+ unigrams |
| Corpus (BG) | ~2K unigrams | 100K+ unigrams |
| Integration tests | 0 | 5 |
| CI | None | GitHub Actions |
Full Changelog
Features
feat(core): Extract InputMethodCore — cross-platform key event state machinefeat(core): Fuzzy prefix matching with Damerau-Levenshtein fallbackfeat(cvm): Recency decay for frequency scoring (time-aware adaptation)feat(win): Windows TSF scaffold → COM DLL → composition → display attributes → ToUnicodeExfeat(mac): macOS IMK scaffold → C FFI → trigram + corpus file loadingfeat(py):from_config,export/import_personalin PyO3 bridgefeat(bench): Criterion benchmark suite (8 groups, 316 lines)feat(corpus): Scale EN and BG corpora to 100K+ unigrams + tech corpus (421 terms)
Fixes
fix(win): ToUnicodeEx for layout-aware key mapping (Shift, Cyrillic)fix(win): Smart key claiming — stops blocking other IMEsfix(win): Dead-key composition safety (dwFlags=4)fix(win): Honor riid in COM entry points, track live objectsfix(ibus): Graceful daemon shutdown (SIGTERM/SIGINT handlers)fix(ibus): Corpus deduplication (prevent .json + .bin double-loading)fix: 40+ code review findings (4 critical, 10 important, 26 low)fix(ci): Deterministic test_personal_boost + corpus tracking
Refactoring
refactor(ibus): Delegate corpus loading to Rust (removes 31 lines of Python)refactor: Consolidate config parsing, remove dead code
Documentation
docs(readme): Fix phantom features, mark completed roadmap itemsdocs(readme): Replace aspirational targets with actual benchmark numbersdocs: CI workflow + cross-platform README update
What's Next (Roadmap)
- Memory-mapped binary corpus format (mmap — eliminate JSON parse at startup)
- GTK4 popup panel for Alt-hold alternatives
- SQLite persistence for CVM personal vocabulary
- BK-tree or SymSpell for O(1) fuzzy lookup at scale
- Windows installer (MSI) + macOS .app bundle
- Mobile port (Android IME via JNI)
Install
git clone https://github.com/RMANOV/smartkey.git
cd smartkey
cargo build --release
cargo test --workspace # 106 tests
cargo bench # 8 benchmark groupsLinux (IBus): maturin develop -m crates/smartkey-py/Cargo.toml --release
Windows (TSF): cargo build -p smartkey-win --release → register DLL
macOS (IMK): cargo build -p smartkey-mac --release → link from Swift
Full diff: v0.1.0...v0.2.0
v0.1.0 — Genesis: The Keyboard That Learns You
SmartKey v0.1.0 — Genesis
The keyboard that learns you, not the other way around.
SmartKey is a system-wide predictive keyboard engine for Linux that predicts the next word — not the next paragraph. Pure Rust core. Zero cloud. Sub-microsecond lookups. Your typing patterns never leave your machine.
What's Inside
Prediction Engine (Rust, 1,258 LoC)
A 3-stage ensemble scorer that blends three independent signals into a single ranked prediction:
| Stage | Algorithm | Latency | What It Does |
|---|---|---|---|
| N-gram Trie | Aho-Corasick prefix search | < 1μs | Finds all words matching your typed prefix |
| Markov Chain | Katz backoff (trigram → bigram → unigram) | < 5μs | Scores candidates by context probability |
| CVM Counter | Streaming cardinality estimator | < 2μs | Boosts words you type frequently |
final_score = α · corpus + β · markov + γ · personal
= 0.4 · "how common" + 0.4 · "how likely here" + 0.2 · "how often YOU say this"
After a week of coding, async outranks assistant. After switching to Bulgarian, здравей surfaces automatically. No retraining. No config. Just typing.
IBus Integration (Python, 499 LoC)
Full IBus.Engine subclass with:
- Ghost text — top prediction appears inline as dimmed text
- Tab to accept entire word, → to accept one character, Esc to dismiss
- Super+Escape kill switch — instantly disables/re-enables
- Automatic context tracking (5-word sliding window)
- Multi-language corpus loading (EN + BG out of the box)
Corpus Pipeline (Python + Bash, 793 LoC)
Production-grade Wikipedia-to-JSON pipeline:
download_corpus.sh— downloads Wikipedia article dumps (any language)extract_wiki.py— streaming MediaWiki XML extractor (18 regex cleanup passes, skips redirects/stubs)build_corpus.py— sentence-aware tokenizer with contraction + Cyrillic support, streaming n-gram countercorpus_stats.py— quality metrics: Zipf exponent, hapax legomena, coverage curves
# Build a Bulgarian corpus from Wikipedia in one pipeline:
./download_corpus.sh --lang bg
python3 extract_wiki.py raw/bgwiki-*.xml.bz2 | python3 build_corpus.py --stdin --lang bg --min-freq 3By the Numbers
| Metric | Value |
|---|---|
| Rust LoC | 1,313 |
| Python LoC | 1,193 |
| Test suite | 38 tests, 0 failures |
| Modules tested | 5 (CVM, n-gram, Markov, prefix, ensemble) |
| Dependencies | Minimal (aho-corasick, rand, pyo3, serde) |
| Prediction latency | < 10μs target |
| Memory footprint | < 50MB target (EN+BG corpora) |
| Cloud dependency | Zero. Everything is local. |
The CVM Innovation
Most keyboards use frequency tables or neural nets. SmartKey uses the Chakraborty–Vinodchandran–Meel (CVM) algorithm — a streaming cardinality estimator that naturally provides:
- Vocabulary decay — words you stop typing get probabilistically evicted
- Automatic adaptation — no timestamps, no TTLs, no explicit forgetting
- O(log N) memory — tracks your entire vocabulary in kilobytes
- Zero-config language detection — per-language CVM counters detect script switches mid-sentence
cvm.frequency_score("function") // → 64.0 (you're a programmer)
cvm.frequency_score("synergy") // → 0.0 (you stopped writing corporate email)Dual License
- smartkey-core (Rust library) → Apache 2.0 — embed in your mobile keyboard, IDE plugin, or WASM app
- Application layer (IBus, corpus, PyO3) → GPL 3.0 — derivatives stay open
Quick Start
git clone https://github.com/RMANOV/smartkey.git
cd smartkey
cargo build --release
cargo test -p smartkey-core # 38 tests, 0 failures
# Build PyO3 bridge
pip install maturin
maturin develop -m crates/smartkey-py/Cargo.toml --release
# Register with IBus
cp ibus/smartkey.xml ~/.local/share/ibus/component/
ibus restartWhat's Next (Roadmap)
- Criterion benchmark suite with latency targets
- Memory-mapped binary corpus (mmap, no JSON parse at startup)
- GTK4 popup panel for Alt-hold alternatives
- SQLite persistence for CVM personal vocabulary
- Pre-built EN + BG corpus packages
- Typo correction via Levenshtein distance (Damerau-Levenshtein, v0.2.0)
- Android IME port via JNI
Built with Rust, Python, and obsessive attention to keystroke latency.