27 May 21:06

89a979e

v0.5.2 — security hardening: drop S3 fallback, parquet size cap, platform-support clarification Latest

Latest

MikeRust v0.5.2

Security-driven patch release on top of v0.5.1b. Three changes land
together — all resolve outstanding Dependabot findings without
touching application behaviour.

Known issue: Gemini 3.5 Flash is still not working.

Highlights

Drop the `s3-storage` feature and the AWS SDK chain

aws-sdk-s3 + aws-config are gone from Cargo.toml. Removing them
drops 70+ transitive crates from the lockfile, including:

rustls 0.21.12 + rustls-webpki 0.101.7 — the Dependabot
advisory that prompted this release. The AWS SDK chain was the
only path that pulled in the vulnerable 0.21 line of rustls; the
rest of the codebase (fastembed, sqlx, hf-hub) already uses
rustls 0.23 + rustls-webpki 0.103.13.
aws-smithy-*, aws-sigv4, rustls-native-certs, sct, … the
full SDK transitive graph.

The S3/R2 path was always feature-gated OFF by default and the
s3-storage feature was never actually wired into
src/storage/mod.rs::make_storage — that function has only ever
returned LocalStorage. The trait stays for the ergonomic win of a
single Box<dyn Storage> handle and to keep the door open for a
sovereign-cloud backend on rustls 0.23 later. End-user behaviour
is unchanged.

local-storage is kept as a no-op feature so anyone who pinned it
in their own manifest still resolves cleanly.

Parquet shard size cap — mitigation for the Thrift advisory

The parquet crate (used by the Italian Legal Cassazione bulk
importer) transitively depends on thrift 0.17.0, which carries an
unfixed-upstream "Memory Allocation with Excessive Size Value"
advisory: a crafted Parquet footer can declare an allocation that
the decoder honours before validating against the actual stream
length, causing a DoS-style OOM. The Apache Thrift Rust bindings
have not shipped a fix; even the latest parquet 58 still depends
on thrift ^0.17, so a version bump alone would not help.

Mitigation lands as a hard byte cap applied before the bytes ever
reach the Parquet decoder:

New file config/corpora.json holds the knob
max_parquet_file_size_mb (default 500 MB — comfortably above
any legitimate shard from the corpora we ingest, well below the
threshold at which a malicious footer would matter on a 16 GB
workstation).
New module src/corpora/limits.rs is the loader. Same
env-override + ancestor-walk pattern as src/presets/model.rs
(MRUST_CORPORA_LIMITS env var). Falls back to defaults with a
warning if the JSON is missing or malformed.
src/corpora/italian_legal.rs refuses to decode a shard above the
cap with a clear bail message.
docs/CORPUS_PLUGINS.md gains a Security section explaining
the Thrift advisory, the dila-bulk-xml analogous concern
(oversized XML / tar-walker hardening), and concrete guidance for
plugin authors — pin URLs to official publishers, never accept
user-supplied URLs verbatim, leave the cap at default unless you
have a specific reason to raise it.

Platform-support clarification (resolves the `glib 0.18.5` finding)

README gains a "Supported platforms" section right at the top of
Quick start:

Windows is the only currently shipping target (x86_64 + ARM64).
macOS is on the roadmap but work hasn't started — codebase
compiles to aarch64-apple-darwin already; gating items are
signing / notarisation and a Touch-ID equivalent of the
Windows-Hello unlock flow.
Linux is not supported and there are no plans to add it.

The practical consequence for security scanners: the gtk / glib /
atk / webkit2gtk / tao-linux chain that Tauri pulls in for the
Linux WebView backend is not present in any shipped MSI —
webview2-com + windows-rs are what compile in on Windows.
Advisories on that chain (e.g. glib 0.18.5 flagged 2026-05-26) are
therefore inert for end users and tracked as "not affected — Linux
support is not in scope".

Downloads

Pre-built MSIs for Windows:

MikeRust_0.5.2_x64.msi — Windows x86_64
MikeRust_0.5.2_arm64.msi — Windows ARM64, Snapdragon X Elite native

Each bundles onnxruntime.dll 1.20.0 + pdfium.dll. Double-click to
install; runtime logs land in %USERPROFILE%\mikerust-data\mike-tauri.log.

Migration notes

No new database migration. Schema unchanged from v0.5.1 (still
at migration 0030).
v0.5.1 / v0.5.1b users: install the new MSI on top — no data
changes, no config changes required.
Custom corpus plugin authors: read the new Security section
in docs/CORPUS_PLUGINS.md before pointing a hf-dataset-bulk
strategy at a third-party dataset. The max_parquet_file_size_mb
cap in config/corpora.json will refuse oversized shards by design.
Anyone consuming the s3-storage cargo feature in a fork:
drop the feature flag from your build invocation. The feature is
gone; local-storage (a no-op) is still accepted for back-compat.

License

MikeRust is distributed under AGPL-3.0-only. The Semplifica
wordmark and logo are trademarks; see NOTICE.md. The full licence
text is available in-app under Settings → Licenza.

Assets 4

26 May 17:06

dariofinardi

v0.5.1b

49064f0

v0.5.1b — bugfix: Gemini sampler restored to default for versatile heterogeneous-document analysis

MikeRust v0.5.1b

Hotfix release on top of v0.5.1. Single fix: rolls back the
Gemini temperature override so the model stays versatile when
analysing heterogeneous documents (mixed-content medical records,
multi-format legal bundles, etc.) — the v0.5.1 tightening from 1.0
to 0.5 was making gemini-2.5-flash lose expressiveness on long,
varied inputs.

Everything else from v0.5.1 ships unchanged: hybrid bracket splitter,
cross-message citation lookup, tightened citation rules, orphan KB
cleanup endpoint, A4-fit DOCX viewer, diagnostic logging, and the
E2E test scaffold.

The fix

Gemini white-out on long heterogeneous contexts

Symptom. On gemini-2.5-flash, with a long context made of
varied document types (the "Timeline cronologica clinica" workflow
on three medical PDFs reproduced it cleanly), the model would emit
the first ~30 characters of real content (e.g. a Markdown table
header) and then get stuck in a low-entropy whitespace loop —
generating 155,010 characters of pure spaces in one observed run,
~1.96 million in another — before closing the stream. The user saw
a frozen response with no error.

Root cause. v0.5.1 set temperature = 0.5 on every provider
for citation determinism. On Claude / OpenAI / local that lower
temperature is fine. On Gemini 2.5 Flash specifically, with a
system prompt above ~45 kB plus heterogeneous tool-result content,
the low temperature collapsed the sampler into a repeating-token
regime that the model couldn't escape.

Fix. src/llm/gemini.rs no longer sets temperature on any
Gemini family. generationConfig is now only attached when
thinkingConfig is needed (2.5 family), and legacy 1.5 / 2.0
families get no generationConfig at all. The result: Gemini falls
back to its API default (~1.0), recovering the versatility that
heterogeneous-document analysis depends on, while Claude / OpenAI /
local keep their tighter 0.5 for deterministic citation output.

23 llm::gemini unit tests pass; the regression test
build_body_omits_generation_config_on_legacy_families now pins
the new behaviour.

Downloads

Pre-built MSIs for Windows:

MikeRust_0.5.1b_x64.msi — Windows x86_64
MikeRust_0.5.1b_arm64.msi — Windows ARM64, Snapdragon X Elite native

Drop-in replacement for v0.5.1: same database schema (migration
0030 still the latest), same on-disk paths, same config. Install
on top of v0.5.1; no migration step.

Migration notes

No new database migration. Schema unchanged from v0.5.1.
v0.5.1 users: install MikeRust_0.5.1b_*.msi on top — the
citation pipeline, doc viewer, and KB cleanup endpoint are
byte-identical to v0.5.1, only the Gemini sampler config differs.
v0.5.0 and earlier: the v0.5.1 release notes still apply on
top of this one for the citation overhaul + A4 viewer + orphan
cleanup changes.

License

MikeRust is distributed under AGPL-3.0-only. The Semplifica
wordmark and logo are trademarks; see NOTICE.md. The
full licence text is available in-app under Settings → Licenza.

Assets 4

26 May 12:39

dariofinardi

v0.5.1

62b49cd

v0.5.1 — citation pipeline overhaul + orphan KB cleanup + A4 doc viewer

MikeRust v0.5.1

Stable consolidation of four work-in-progress drops (formerly tagged
v0.5.1 → v0.5.4) into a single shippable release. v0.5.0 exposed a
clutch of citation-pipeline failures with mid-tier LLMs (Gemini 2.5
Flash, smaller local models): mixed-content brackets, dropped
cross-message references, orphan KB chunks surviving file deletions,
and run-to-run citation inconsistency. This release fixes all of them
— and, while we were in the doc viewer, adds an A4-fit DOCX preview
that auto-zooms when the side panel is resized.

Highlights

Citation pipeline — model-independent post-processors

Hybrid bracket splitter (split_hybrid_citation_brackets).
Decomposes mixed-content brackets the model occasionally emits
([c1, c2, FILE.pdf, p.4, doc-7, doc-8]) into clean ones the
frontend MARKER_GROUP regex can pill-ify
([c1] [c2] [doc-id: FILE.pdf, page 4] [doc-id: doc-7] [doc-id: doc-8]).
Idempotent; stops at <CITATIONS> so it can't corrupt the
trailing JSON block (regression test pins this).
Cross-message citation lookup (renderMessageHtml). When a
[cN] in the current turn has no matching annotation but an
earlier assistant turn in the same chat did, the pill resolves to
the older annotation. Catches the common case of models reusing
cN labels across turns.
Tightened CITATION QUALITY RULES in MRUST_SYSTEM_PROMPT:
omit empty / short quotes; page ranges only for [[PAGE_BREAK]]
spans (else integer pages); prefer per-passage over per-document
citations; prefer attached doc-N over KB gN/pN; re-emit
cross-turn [cN] annotations in the current turn's
<CITATIONS> block.

Cross-provider determinism + headroom

temperature = 0.5 on every LLM provider
(Anthropic / Gemini / OpenAI / local-OpenAI-compatible).
Defaults were 1.0 across the board — too random for structured
output. Lowering it makes citation extraction reproducible run to
run across all four providers.
max_tokens 4096 → 8192 on Claude + local. Doubles the
headroom for trailing <CITATIONS> JSON on long answers; Gemini
was already on its default (≥8192 on the 2.x family).

Orphan KB chunks — chat-time filter + cleanup endpoint

User-reported failure mode: removing a synced doc from the UI left
its embeddings behind, and every chat turn the cosine retrieval kept
surfacing those stale chunks — one chat ended up emitting 12
citations all pointing to the same dead PDF page.

retrieve_kb_chunks now probes each chunk's source_path on
disk and drops missing ones with an
[rag] orphan KB chunk dropped … warning + per-turn summary.
New endpoint POST /sync/cleanup-orphans — per-user cascade
delete of documents + doc_chunks + synced_files rows whose
backing file is gone. Returns
{ scanned, orphans, deleted_docs, deleted_chunks, deleted_synced }.
Frontend modal: when a citation source 404s, the doc viewer
surfaces a warning panel with a Pulisci sorgenti rimosse button
that calls the cleanup endpoint and toasts the row count.

DocxView — A4 fit (default) + reflow toggle

Preserves the document's native A4 page geometry (width + height +
margins, breakPages: true) and applies CSS
zoom = containerWidth / pageWidth (clamped to [0.4, 1.5]) via a
ResizeObserver so the page auto-scales when the user drags the
side-panel divider. A top-right toggle flips to a reflow mode (drops
the page geometry, prose flows the full panel width) for narrow
side-panel reading. ResizeObserver detached in reflow mode so the
responsive cost is zero off-path.

Diagnostic logging

[rag][cite-diag] retrieve_kb_chunks … spells out HyDE on/off +
locale + domain + top-K and clarifies that base cosine retrieval
always runs regardless of HyDE (removes the recurring "I turned
HyDE off, why is it still searching?" confusion).
[chat][cite-diag] … events trace each step of the citation
pipeline: final response shape, tail dump, per-step outcome, FINAL
SSE payload size.
[chat] <CITATIONS> block found but is not valid JSON warning
dumps head / mid / tail (300 chars each) of the offending payload
for offline diagnosis.

E2E test scaffold

New tests/medical_citations_e2e.rs bypasses the frontend entirely:
places 10 PDFs from tests/medical/ straight into cache + documents
rows, exercises the real POST /chat handler via
tower::ServiceExt::oneshot, parses the SSE stream, and prints a
structured JSON report of citation quality. Gated by #[ignore] +
GEMINI_API_KEY; A/B switches via E2E_HYDE / E2E_MODEL.

$env:GEMINI_API_KEY = "..."
cargo test --test medical_citations_e2e --features rag,pdf `
    -- --ignored --nocapture

Assets 4

25 May 13:18

dariofinardi

v0.4.7

acef3e0

v0.4.7 — version badge fix + chat-files popover + License panel

MikeRust v0.4.7

Rollup release that closes a string of UX and persistence gaps reported during the v0.4.x review cycle, plus a hotfix for the visible-version regression introduced in v0.4.6.

Highlights

Generated-docx Accept / Reject flow — full lifecycle

When the model emits a docx the user can now Accept it (keep in chat context) or Reject it (replace with an LLM-generated summary anchored on a mandatory user motive). Re-Accept restores the original; Re-Reject overwrites the archive with a fresh summary. A new "Vedi riassunto" read-only modal surfaces the archived reason + summary after the reject modal closes, so the user can re-read what the model now sees in place of the document.

Backend: documents.decision / decision_reason / decision_summary columns (migration 0029); POST /document/:id/decision runs the summariser and persists; chat::load_attached_docs substitutes the body with a reason + summary stub on every subsequent turn.

Chat-files popover — five categories, one shortcut

New Files button in the composer footer opens a popover listing every document the chat has ever interacted with, across the five categories the chat archive is expected to retain:

Origin	Where it comes from
`Caricato`	composer paperclip → `documents.chat_id`
`Generato`	`generate_docx` tool → `documents.chat_id`
`Rifiutato` (variant)	any of the above with `decision='rejected'` — strikethrough + red badge
`Progetto`	`chats.project_id` → `documents.project_id`
`Citato`	KB / corpora docs cited via `messages.annotations`

Per-format icon colours (Excel green / Word blue / PDF red / PowerPoint orange / Markdown text-primary). Click a row to open it in the existing doc-viewer side panel, where Accept / Reject / Vedi riassunto / Apri in Word all already work. Backed by a new GET /chat/:id/documents endpoint that survives chat reload, chat switching and message compaction.

Version badge + License panel

Small v{version} badge next to "MikeRust" in the sidebar so the user always knows which build is running. New Settings → Licenza panel shows MikeRust + version, the SPDX identifier (AGPL-3.0-only), a plain-language summary of the AGPL terms and the full bundled LICENSE text in a scrollable monospace block.

Multi-doc anamnesis docx (v0.3.6)

Fixed a Gemini tool-code crash on multi-document anamnesis flows and a doc-label off-by-one (1-indexed labels now match [doc-N] references).

Full changelog by patch

v0.4.7 — version badge actually renders (replaced runtime getVersion() + $state with build-time package.json import)
v0.4.6 — version label + License settings panel (broken; superseded by 0.4.7)
v0.4.5 — chat-files popover surfaces all 5 doc categories (project + KB-referenced)
v0.4.4 — chat-files popover backend-sourced, survives reload (new GET /chat/:id/documents)
v0.4.3 — chat-files popover MVP
v0.4.2 — fix reject modal step-2 transition (untrack(initialReason) in $effect)
v0.4.1 — persistent "Vedi riassunto" for rejected docs
v0.4.0 — domain-aware system-prompt prologue (66 .md files × 6 locales × 11 domains)

See HISTORY.md for the per-patch details and rationale.

Downloads

Pre-built MSIs for Windows:

MikeRust_0.4.7_x64.msi — Windows x86_64
MikeRust_0.4.7_arm64.msi — Windows ARM64 (Snapdragon X Elite native)

Each bundles onnxruntime.dll 1.20.0 and pdfium.dll. Double-click to install; runtime logs land in %USERPROFILE%\mikerust-data\mike-tauri.log.

License

MikeRust is distributed under AGPL-3.0-only. The Semplifica wordmark and logo are trademarks; see NOTICE.md. The full licence text is now also available in-app under Settings → Licenza.

Assets 4

24 May 13:39

dariofinardi

v0.3.2

d0a49a9

v0.3.2 — tool iter cap, Ollama probe proxy, biometric icon

Three independent fixes in one release.

Fixed — chat stops with "too many tool iterations" on multi-doc workflows

Reported on Gemini 2.5 Flash with a medical-anamnesis workflow attached to ten clinical-record PDFs: the model called read_document on a couple of files, then aborted with "stopped: too many tool iterations". MAX_TOOL_ITERATIONS = 5 in src/routes/chat.rs was a holdover from single-doc debug runs; legitimate due-diligence / medical-anamnesis flows need 5–15 source-doc reads before composing the answer. Bumped to 20 — bounds a runaway loop at ~20× the per-turn latency while comfortably covering ten-doc anamnesis flows. Fix applies to every LLM that does tool-use (Gemini, Claude, OpenAI, local Ollama / vLLM, …).

Fixed — Settings → "Modelli LLM" probe CORS-blocked

The Settings page used to issue fetch(${base}/models) directly from the WebView origin http://tauri.localhost. External Ollama / llama-server / vLLM instances rarely whitelist that origin, so the browser blocked every probe with the "No 'Access-Control-Allow-Origin' header is present" message and the model dropdown stayed empty.

New backend endpoint GET /models/local/probe?base=…&api_key=… does the server-to-server fetch (no Origin involved → no CORS) and returns the upstream payload verbatim, plus typed error fields (upstream_status, error).
ModelsSection.svelte now goes through the proxy instead of fetching the runtime directly.
The chat path was never affected — chat-time LLM calls already go through the Rust reqwest client.

Fixed — biometric unlock dialog used a hand-drawn fingerprint

BiometricPrompt.svelte rendered its fingerprint via seven inline SVG paths. The result looked off-centre and hairline-thin against the brand-500 background and didn't match the lucide-svelte icon family the rest of the app uses. Swapped for <Fingerprint size={40} class="text-(--color-brand-500)" /> from lucide-svelte. The currentColor inheritance preserves the brand-tint behaviour, so no styling change at the call site.

Installer artefacts

MikeRust_0.3.2_x64.msi — Windows x86_64
MikeRust_0.3.2_arm64.msi — Windows ARM64

See HISTORY.md for the cumulative v0.2.x → v0.3.x timeline.

Assets 4

24 May 08:44

dariofinardi

v0.3.1

bc4708e

release: v0.3.1 — bundle JSON config registries in the MSI

Configs, workflows, templates were missing.

Assets 4

24 May 07:39

dariofinardi

v0.3.0

e797472

v0.3.0 — installed-MSI storage ACCESS_DENIED fix

Symptom in v0.2.x installed MSI: uploading a document or opening
the viewer surfaced "Could not load document — Accesso negato
(os error 5)" and the backend logged ACCESS_DENIED on storage writes.

Root cause: STORAGE_PATH defaulted to the cwd-relative
./data/storage/. For dev (cargo run from workspace root) that
worked; for an MSI launched from a Start-menu shortcut the cwd is
typically C:\Program Files\MikeRust\ — admin-only for non-elevated
processes, so create_dir_all failed with os error 5 and every
subsequent put returned HTTP 500.

Fix: LocalStorage::new() now defaults to
<USERPROFILE|HOME>/mikerust-data/storage/, the same user-writable
directory the SQLite DB and the v0.2.5 PII cache already live under.
The STORAGE_PATH env override stays for tests / fixtures / the
standalone-backend dev story.

Installer artefacts

MikeRust_0.3.0_x64.msi — Windows x86_64
MikeRust_0.3.0_arm64.msi — Windows ARM64

See HISTORY.md for the full v0.2.x → v0.3.0 timeline (PII vertical, SSE refactor, cold-launch fixes, the leak-fix cascade, and this storage path patch).

Assets 4

23 May 19:09

dariofinardi

v0.2.7

458d543

MikeRust 0.2.7

v0.2.7 — close remaining PII bypasses

v0.2.6 closed the inline-attached redaction path for follow-up turns,
but two parallel routes kept feeding the LLM raw, unredacted text.
Confirmed on disk: the cache/pii/<doc_id>.txt cache had every
license plate masked to [LICENSE_PLATE], yet the model answer
still quoted the original plates — the leak was via the RAG
retrieval branch and (potentially) via the read_document /
find_in_document tools.

Fix

retrieve_kb_chunks now consults documents.pii_protected per
chunk and drops every chunk whose source document is flagged.
Each drop is logged in mike-tauri.log at info level.
resolve_doc + a new read_doc_text_for_llm helper substitute
the redacted cache file for the raw bytes whenever the document
is protected. A cache miss is a hard safety stop ("wait for the
inline-attached pass to finish") instead of falling back to the
raw text.
edit_document refuses outright when targeting a protected
document — overwriting the raw docx and offering it as a
download would re-expose every entity the redacted cache had
carefully masked.

Installer artefacts

MikeRust_0.2.7_x64.msi — Windows x86_64
MikeRust_0.2.7_arm64.msi — Windows ARM64

See HISTORY.md for the full themed entry covering this and the v0.2.2 → v0.2.6 lead-up patches.

Assets 4

Releases: SemplificaAI/MikeRust

v0.5.2 — security hardening: drop S3 fallback, parquet size cap, platform-support clarification

MikeRust v0.5.2

Highlights

Drop the s3-storage feature and the AWS SDK chain

Parquet shard size cap — mitigation for the Thrift advisory

Platform-support clarification (resolves the glib 0.18.5 finding)

Downloads

Migration notes

License

Uh oh!

v0.5.1b — bugfix: Gemini sampler restored to default for versatile heterogeneous-document analysis

MikeRust v0.5.1b

The fix

Gemini white-out on long heterogeneous contexts

Downloads

Migration notes

License

Uh oh!

v0.5.1 — citation pipeline overhaul + orphan KB cleanup + A4 doc viewer

MikeRust v0.5.1

Highlights

Citation pipeline — model-independent post-processors

Cross-provider determinism + headroom

Orphan KB chunks — chat-time filter + cleanup endpoint

DocxView — A4 fit (default) + reflow toggle

Diagnostic logging

E2E test scaffold

Uh oh!

v0.4.7 — version badge fix + chat-files popover + License panel

MikeRust v0.4.7

Highlights

Generated-docx Accept / Reject flow — full lifecycle

Chat-files popover — five categories, one shortcut

Version badge + License panel

Multi-doc anamnesis docx (v0.3.6)

Full changelog by patch

Downloads

License

Uh oh!

v0.3.2 — tool iter cap, Ollama probe proxy, biometric icon

Fixed — chat stops with "too many tool iterations" on multi-doc workflows

Fixed — Settings → "Modelli LLM" probe CORS-blocked

Fixed — biometric unlock dialog used a hand-drawn fingerprint

Installer artefacts

Uh oh!

release: v0.3.1 — bundle JSON config registries in the MSI

Uh oh!

v0.3.0 — installed-MSI storage ACCESS_DENIED fix

Installer artefacts

Uh oh!

MikeRust 0.2.7

v0.2.7 — close remaining PII bypasses

Fix

Installer artefacts

Uh oh!

Drop the `s3-storage` feature and the AWS SDK chain

Platform-support clarification (resolves the `glib 0.18.5` finding)