Releases: SemplificaAI/MikeRust
v0.5.2 — security hardening: drop S3 fallback, parquet size cap, platform-support clarification
MikeRust v0.5.2
Security-driven patch release on top of v0.5.1b. Three changes land
together — all resolve outstanding Dependabot findings without
touching application behaviour.
Known issue: Gemini 3.5 Flash is still not working.
Highlights
Drop the s3-storage feature and the AWS SDK chain
aws-sdk-s3 + aws-config are gone from Cargo.toml. Removing them
drops 70+ transitive crates from the lockfile, including:
rustls 0.21.12+rustls-webpki 0.101.7— the Dependabot
advisory that prompted this release. The AWS SDK chain was the
only path that pulled in the vulnerable 0.21 line of rustls; the
rest of the codebase (fastembed, sqlx, hf-hub) already uses
rustls 0.23+rustls-webpki 0.103.13.aws-smithy-*,aws-sigv4,rustls-native-certs,sct, … the
full SDK transitive graph.
The S3/R2 path was always feature-gated OFF by default and the
s3-storage feature was never actually wired into
src/storage/mod.rs::make_storage — that function has only ever
returned LocalStorage. The trait stays for the ergonomic win of a
single Box<dyn Storage> handle and to keep the door open for a
sovereign-cloud backend on rustls 0.23 later. End-user behaviour
is unchanged.
local-storage is kept as a no-op feature so anyone who pinned it
in their own manifest still resolves cleanly.
Parquet shard size cap — mitigation for the Thrift advisory
The parquet crate (used by the Italian Legal Cassazione bulk
importer) transitively depends on thrift 0.17.0, which carries an
unfixed-upstream "Memory Allocation with Excessive Size Value"
advisory: a crafted Parquet footer can declare an allocation that
the decoder honours before validating against the actual stream
length, causing a DoS-style OOM. The Apache Thrift Rust bindings
have not shipped a fix; even the latest parquet 58 still depends
on thrift ^0.17, so a version bump alone would not help.
Mitigation lands as a hard byte cap applied before the bytes ever
reach the Parquet decoder:
- New file
config/corpora.jsonholds the knob
max_parquet_file_size_mb(default 500 MB — comfortably above
any legitimate shard from the corpora we ingest, well below the
threshold at which a malicious footer would matter on a 16 GB
workstation). - New module
src/corpora/limits.rsis the loader. Same
env-override + ancestor-walk pattern assrc/presets/model.rs
(MRUST_CORPORA_LIMITSenv var). Falls back to defaults with a
warning if the JSON is missing or malformed. src/corpora/italian_legal.rsrefuses to decode a shard above the
cap with a clear bail message.docs/CORPUS_PLUGINS.mdgains a Security section explaining
the Thrift advisory, thedila-bulk-xmlanalogous concern
(oversized XML / tar-walker hardening), and concrete guidance for
plugin authors — pin URLs to official publishers, never accept
user-supplied URLs verbatim, leave the cap at default unless you
have a specific reason to raise it.
Platform-support clarification (resolves the glib 0.18.5 finding)
README gains a "Supported platforms" section right at the top of
Quick start:
- Windows is the only currently shipping target (x86_64 + ARM64).
- macOS is on the roadmap but work hasn't started — codebase
compiles toaarch64-apple-darwinalready; gating items are
signing / notarisation and a Touch-ID equivalent of the
Windows-Hello unlock flow. - Linux is not supported and there are no plans to add it.
The practical consequence for security scanners: the gtk / glib /
atk / webkit2gtk / tao-linux chain that Tauri pulls in for the
Linux WebView backend is not present in any shipped MSI —
webview2-com + windows-rs are what compile in on Windows.
Advisories on that chain (e.g. glib 0.18.5 flagged 2026-05-26) are
therefore inert for end users and tracked as "not affected — Linux
support is not in scope".
Downloads
Pre-built MSIs for Windows:
MikeRust_0.5.2_x64.msi— Windows x86_64MikeRust_0.5.2_arm64.msi— Windows ARM64, Snapdragon X Elite native
Each bundles onnxruntime.dll 1.20.0 + pdfium.dll. Double-click to
install; runtime logs land in %USERPROFILE%\mikerust-data\mike-tauri.log.
Migration notes
- No new database migration. Schema unchanged from v0.5.1 (still
at migration 0030). - v0.5.1 / v0.5.1b users: install the new MSI on top — no data
changes, no config changes required. - Custom corpus plugin authors: read the new
Securitysection
indocs/CORPUS_PLUGINS.mdbefore pointing ahf-dataset-bulk
strategy at a third-party dataset. Themax_parquet_file_size_mb
cap inconfig/corpora.jsonwill refuse oversized shards by design. - Anyone consuming the
s3-storagecargo feature in a fork:
drop the feature flag from your build invocation. The feature is
gone;local-storage(a no-op) is still accepted for back-compat.
License
MikeRust is distributed under AGPL-3.0-only. The Semplifica
wordmark and logo are trademarks; see NOTICE.md. The full licence
text is available in-app under Settings → Licenza.
v0.5.1b — bugfix: Gemini sampler restored to default for versatile heterogeneous-document analysis
MikeRust v0.5.1b
Hotfix release on top of v0.5.1. Single fix: rolls back the
Gemini temperature override so the model stays versatile when
analysing heterogeneous documents (mixed-content medical records,
multi-format legal bundles, etc.) — the v0.5.1 tightening from 1.0
to 0.5 was making gemini-2.5-flash lose expressiveness on long,
varied inputs.
Everything else from v0.5.1 ships unchanged: hybrid bracket splitter,
cross-message citation lookup, tightened citation rules, orphan KB
cleanup endpoint, A4-fit DOCX viewer, diagnostic logging, and the
E2E test scaffold.
The fix
Gemini white-out on long heterogeneous contexts
Symptom. On gemini-2.5-flash, with a long context made of
varied document types (the "Timeline cronologica clinica" workflow
on three medical PDFs reproduced it cleanly), the model would emit
the first ~30 characters of real content (e.g. a Markdown table
header) and then get stuck in a low-entropy whitespace loop —
generating 155,010 characters of pure spaces in one observed run,
~1.96 million in another — before closing the stream. The user saw
a frozen response with no error.
Root cause. v0.5.1 set temperature = 0.5 on every provider
for citation determinism. On Claude / OpenAI / local that lower
temperature is fine. On Gemini 2.5 Flash specifically, with a
system prompt above ~45 kB plus heterogeneous tool-result content,
the low temperature collapsed the sampler into a repeating-token
regime that the model couldn't escape.
Fix. src/llm/gemini.rs no longer sets temperature on any
Gemini family. generationConfig is now only attached when
thinkingConfig is needed (2.5 family), and legacy 1.5 / 2.0
families get no generationConfig at all. The result: Gemini falls
back to its API default (~1.0), recovering the versatility that
heterogeneous-document analysis depends on, while Claude / OpenAI /
local keep their tighter 0.5 for deterministic citation output.
23 llm::gemini unit tests pass; the regression test
build_body_omits_generation_config_on_legacy_families now pins
the new behaviour.
Downloads
Pre-built MSIs for Windows:
MikeRust_0.5.1b_x64.msi— Windows x86_64MikeRust_0.5.1b_arm64.msi— Windows ARM64, Snapdragon X Elite native
Drop-in replacement for v0.5.1: same database schema (migration
0030 still the latest), same on-disk paths, same config. Install
on top of v0.5.1; no migration step.
Migration notes
- No new database migration. Schema unchanged from v0.5.1.
- v0.5.1 users: install
MikeRust_0.5.1b_*.msion top — the
citation pipeline, doc viewer, and KB cleanup endpoint are
byte-identical to v0.5.1, only the Gemini sampler config differs. - v0.5.0 and earlier: the v0.5.1 release notes still apply on
top of this one for the citation overhaul + A4 viewer + orphan
cleanup changes.
License
MikeRust is distributed under AGPL-3.0-only. The Semplifica
wordmark and logo are trademarks; see NOTICE.md. The
full licence text is available in-app under Settings → Licenza.
v0.5.1 — citation pipeline overhaul + orphan KB cleanup + A4 doc viewer
MikeRust v0.5.1
Stable consolidation of four work-in-progress drops (formerly tagged
v0.5.1 → v0.5.4) into a single shippable release. v0.5.0 exposed a
clutch of citation-pipeline failures with mid-tier LLMs (Gemini 2.5
Flash, smaller local models): mixed-content brackets, dropped
cross-message references, orphan KB chunks surviving file deletions,
and run-to-run citation inconsistency. This release fixes all of them
— and, while we were in the doc viewer, adds an A4-fit DOCX preview
that auto-zooms when the side panel is resized.
Highlights
Citation pipeline — model-independent post-processors
- Hybrid bracket splitter (
split_hybrid_citation_brackets).
Decomposes mixed-content brackets the model occasionally emits
([c1, c2, FILE.pdf, p.4, doc-7, doc-8]) into clean ones the
frontendMARKER_GROUPregex can pill-ify
([c1] [c2] [doc-id: FILE.pdf, page 4] [doc-id: doc-7] [doc-id: doc-8]).
Idempotent; stops at<CITATIONS>so it can't corrupt the
trailing JSON block (regression test pins this). - Cross-message citation lookup (
renderMessageHtml). When a
[cN]in the current turn has no matching annotation but an
earlier assistant turn in the same chat did, the pill resolves to
the older annotation. Catches the common case of models reusing
cNlabels across turns. - Tightened CITATION QUALITY RULES in
MRUST_SYSTEM_PROMPT:
omit empty / short quotes; page ranges only for[[PAGE_BREAK]]
spans (else integer pages); prefer per-passage over per-document
citations; prefer attacheddoc-Nover KBgN/pN; re-emit
cross-turn[cN]annotations in the current turn's
<CITATIONS>block.
Cross-provider determinism + headroom
temperature = 0.5on every LLM provider
(Anthropic / Gemini / OpenAI / local-OpenAI-compatible).
Defaults were 1.0 across the board — too random for structured
output. Lowering it makes citation extraction reproducible run to
run across all four providers.max_tokens4096 → 8192 on Claude + local. Doubles the
headroom for trailing<CITATIONS>JSON on long answers; Gemini
was already on its default (≥8192 on the 2.x family).
Orphan KB chunks — chat-time filter + cleanup endpoint
User-reported failure mode: removing a synced doc from the UI left
its embeddings behind, and every chat turn the cosine retrieval kept
surfacing those stale chunks — one chat ended up emitting 12
citations all pointing to the same dead PDF page.
retrieve_kb_chunksnow probes each chunk'ssource_pathon
disk and drops missing ones with an
[rag] orphan KB chunk dropped …warning + per-turn summary.- New endpoint
POST /sync/cleanup-orphans— per-user cascade
delete ofdocuments+doc_chunks+synced_filesrows whose
backing file is gone. Returns
{ scanned, orphans, deleted_docs, deleted_chunks, deleted_synced }. - Frontend modal: when a citation source 404s, the doc viewer
surfaces a warning panel with aPulisci sorgenti rimossebutton
that calls the cleanup endpoint and toasts the row count.
DocxView — A4 fit (default) + reflow toggle
Preserves the document's native A4 page geometry (width + height +
margins, breakPages: true) and applies CSS
zoom = containerWidth / pageWidth (clamped to [0.4, 1.5]) via a
ResizeObserver so the page auto-scales when the user drags the
side-panel divider. A top-right toggle flips to a reflow mode (drops
the page geometry, prose flows the full panel width) for narrow
side-panel reading. ResizeObserver detached in reflow mode so the
responsive cost is zero off-path.
Diagnostic logging
[rag][cite-diag] retrieve_kb_chunks …spells out HyDE on/off +
locale + domain + top-K and clarifies that base cosine retrieval
always runs regardless of HyDE (removes the recurring "I turned
HyDE off, why is it still searching?" confusion).[chat][cite-diag] …events trace each step of the citation
pipeline: final response shape, tail dump, per-step outcome, FINAL
SSE payload size.[chat] <CITATIONS> block found but is not valid JSONwarning
dumps head / mid / tail (300 chars each) of the offending payload
for offline diagnosis.
E2E test scaffold
New tests/medical_citations_e2e.rs bypasses the frontend entirely:
places 10 PDFs from tests/medical/ straight into cache + documents
rows, exercises the real POST /chat handler via
tower::ServiceExt::oneshot, parses the SSE stream, and prints a
structured JSON report of citation quality. Gated by #[ignore] +
GEMINI_API_KEY; A/B switches via E2E_HYDE / E2E_MODEL.
$env:GEMINI_API_KEY = "..."
cargo test --test medical_citations_e2e --features rag,pdf `
-- --ignored --nocapturev0.4.7 — version badge fix + chat-files popover + License panel
MikeRust v0.4.7
Rollup release that closes a string of UX and persistence gaps reported during the v0.4.x review cycle, plus a hotfix for the visible-version regression introduced in v0.4.6.
Highlights
Generated-docx Accept / Reject flow — full lifecycle
When the model emits a docx the user can now Accept it (keep in chat context) or Reject it (replace with an LLM-generated summary anchored on a mandatory user motive). Re-Accept restores the original; Re-Reject overwrites the archive with a fresh summary. A new "Vedi riassunto" read-only modal surfaces the archived reason + summary after the reject modal closes, so the user can re-read what the model now sees in place of the document.
Backend: documents.decision / decision_reason / decision_summary columns (migration 0029); POST /document/:id/decision runs the summariser and persists; chat::load_attached_docs substitutes the body with a reason + summary stub on every subsequent turn.
Chat-files popover — five categories, one shortcut
New Files button in the composer footer opens a popover listing every document the chat has ever interacted with, across the five categories the chat archive is expected to retain:
| Origin | Where it comes from |
|---|---|
Caricato |
composer paperclip → documents.chat_id |
Generato |
generate_docx tool → documents.chat_id |
Rifiutato (variant) |
any of the above with decision='rejected' — strikethrough + red badge |
Progetto |
chats.project_id → documents.project_id |
Citato |
KB / corpora docs cited via messages.annotations |
Per-format icon colours (Excel green / Word blue / PDF red / PowerPoint orange / Markdown text-primary). Click a row to open it in the existing doc-viewer side panel, where Accept / Reject / Vedi riassunto / Apri in Word all already work. Backed by a new GET /chat/:id/documents endpoint that survives chat reload, chat switching and message compaction.
Version badge + License panel
Small v{version} badge next to "MikeRust" in the sidebar so the user always knows which build is running. New Settings → Licenza panel shows MikeRust + version, the SPDX identifier (AGPL-3.0-only), a plain-language summary of the AGPL terms and the full bundled LICENSE text in a scrollable monospace block.
Multi-doc anamnesis docx (v0.3.6)
Fixed a Gemini tool-code crash on multi-document anamnesis flows and a doc-label off-by-one (1-indexed labels now match [doc-N] references).
Full changelog by patch
- v0.4.7 — version badge actually renders (replaced runtime
getVersion()+$statewith build-timepackage.jsonimport) - v0.4.6 — version label + License settings panel (broken; superseded by 0.4.7)
- v0.4.5 — chat-files popover surfaces all 5 doc categories (project + KB-referenced)
- v0.4.4 — chat-files popover backend-sourced, survives reload (new
GET /chat/:id/documents) - v0.4.3 — chat-files popover MVP
- v0.4.2 — fix reject modal step-2 transition (
untrack(initialReason)in$effect) - v0.4.1 — persistent "Vedi riassunto" for rejected docs
- v0.4.0 — domain-aware system-prompt prologue (66 .md files × 6 locales × 11 domains)
See HISTORY.md for the per-patch details and rationale.
Downloads
Pre-built MSIs for Windows:
MikeRust_0.4.7_x64.msi— Windows x86_64MikeRust_0.4.7_arm64.msi— Windows ARM64 (Snapdragon X Elite native)
Each bundles onnxruntime.dll 1.20.0 and pdfium.dll. Double-click to install; runtime logs land in %USERPROFILE%\mikerust-data\mike-tauri.log.
License
MikeRust is distributed under AGPL-3.0-only. The Semplifica wordmark and logo are trademarks; see NOTICE.md. The full licence text is now also available in-app under Settings → Licenza.
v0.3.2 — tool iter cap, Ollama probe proxy, biometric icon
Three independent fixes in one release.
Fixed — chat stops with "too many tool iterations" on multi-doc workflows
Reported on Gemini 2.5 Flash with a medical-anamnesis workflow attached to ten clinical-record PDFs: the model called read_document on a couple of files, then aborted with "stopped: too many tool iterations". MAX_TOOL_ITERATIONS = 5 in src/routes/chat.rs was a holdover from single-doc debug runs; legitimate due-diligence / medical-anamnesis flows need 5–15 source-doc reads before composing the answer. Bumped to 20 — bounds a runaway loop at ~20× the per-turn latency while comfortably covering ten-doc anamnesis flows. Fix applies to every LLM that does tool-use (Gemini, Claude, OpenAI, local Ollama / vLLM, …).
Fixed — Settings → "Modelli LLM" probe CORS-blocked
The Settings page used to issue fetch(${base}/models) directly from the WebView origin http://tauri.localhost. External Ollama / llama-server / vLLM instances rarely whitelist that origin, so the browser blocked every probe with the "No 'Access-Control-Allow-Origin' header is present" message and the model dropdown stayed empty.
- New backend endpoint
GET /models/local/probe?base=…&api_key=…does the server-to-server fetch (no Origin involved → no CORS) and returns the upstream payload verbatim, plus typed error fields (upstream_status,error). ModelsSection.sveltenow goes through the proxy instead of fetching the runtime directly.- The chat path was never affected — chat-time LLM calls already go through the Rust
reqwestclient.
Fixed — biometric unlock dialog used a hand-drawn fingerprint
BiometricPrompt.svelte rendered its fingerprint via seven inline SVG paths. The result looked off-centre and hairline-thin against the brand-500 background and didn't match the lucide-svelte icon family the rest of the app uses. Swapped for <Fingerprint size={40} class="text-(--color-brand-500)" /> from lucide-svelte. The currentColor inheritance preserves the brand-tint behaviour, so no styling change at the call site.
Installer artefacts
MikeRust_0.3.2_x64.msi— Windows x86_64MikeRust_0.3.2_arm64.msi— Windows ARM64
See HISTORY.md for the cumulative v0.2.x → v0.3.x timeline.
release: v0.3.1 — bundle JSON config registries in the MSI
Configs, workflows, templates were missing.
v0.3.0 — installed-MSI storage ACCESS_DENIED fix
Symptom in v0.2.x installed MSI: uploading a document or opening
the viewer surfaced "Could not load document — Accesso negato
(os error 5)" and the backend logged ACCESS_DENIED on storage writes.
Root cause: STORAGE_PATH defaulted to the cwd-relative
./data/storage/. For dev (cargo run from workspace root) that
worked; for an MSI launched from a Start-menu shortcut the cwd is
typically C:\Program Files\MikeRust\ — admin-only for non-elevated
processes, so create_dir_all failed with os error 5 and every
subsequent put returned HTTP 500.
Fix: LocalStorage::new() now defaults to
<USERPROFILE|HOME>/mikerust-data/storage/, the same user-writable
directory the SQLite DB and the v0.2.5 PII cache already live under.
The STORAGE_PATH env override stays for tests / fixtures / the
standalone-backend dev story.
Installer artefacts
MikeRust_0.3.0_x64.msi— Windows x86_64MikeRust_0.3.0_arm64.msi— Windows ARM64
See HISTORY.md for the full v0.2.x → v0.3.0 timeline (PII vertical, SSE refactor, cold-launch fixes, the leak-fix cascade, and this storage path patch).
MikeRust 0.2.7
v0.2.7 — close remaining PII bypasses
v0.2.6 closed the inline-attached redaction path for follow-up turns,
but two parallel routes kept feeding the LLM raw, unredacted text.
Confirmed on disk: the cache/pii/<doc_id>.txt cache had every
license plate masked to [LICENSE_PLATE], yet the model answer
still quoted the original plates — the leak was via the RAG
retrieval branch and (potentially) via the read_document /
find_in_document tools.
Fix
retrieve_kb_chunksnow consultsdocuments.pii_protectedper
chunk and drops every chunk whose source document is flagged.
Each drop is logged inmike-tauri.logat info level.resolve_doc+ a newread_doc_text_for_llmhelper substitute
the redacted cache file for the raw bytes whenever the document
is protected. A cache miss is a hard safety stop ("wait for the
inline-attached pass to finish") instead of falling back to the
raw text.edit_documentrefuses outright when targeting a protected
document — overwriting the raw docx and offering it as a
download would re-expose every entity the redacted cache had
carefully masked.
Installer artefacts
MikeRust_0.2.7_x64.msi— Windows x86_64MikeRust_0.2.7_arm64.msi— Windows ARM64
See HISTORY.md for the full themed entry covering this and the v0.2.2 → v0.2.6 lead-up patches.