CSV Anonymizer is a local-first desktop app for reducing sensitive CSV and pasted-data exposure before sharing, testing, demos, or support work. It detects likely personal data, previews transformations, and writes protected output while preserving the original structure where possible.
All non-LLM detection and transformation runs locally in Rust. Optional local LLM replacement also runs on your machine through Ollama.
Read the generated project wiki at github.com/ddv1982/csv-data-anonymizer/wiki.
- Detects common sensitive fields: emails, names, phone numbers, UUIDs, timestamps, numeric IDs, addresses, postal codes, IPs, URLs, MAC addresses, tax IDs, VAT/BTW numbers, and more.
- Auto-selects high and medium risk columns while still letting you choose exactly which columns to transform.
- Shows a preview before writing output. Rule-based preview replacements are examples; final output gets its own randomized run.
- Streams CSV file transformations instead of loading the whole file into memory.
- Supports lightweight paste workflows for CSV, JSON, XML, YAML, plain text, and logs up to 5 MiB; larger CSV inputs should use the streaming file workflow.
- Includes Quick by Data Type generation for creating protected sample values without first providing input data.
- Keeps repeated source values consistent within each run.
- Offers optional Smart replacement with a local LLM for selected columns.
- Produces a privacy report with transformed column counts, redaction counts, reused values, token counts, Local AI replacement counts, and fallbacks.
The app UI is currently English. CSV and pasted values are read as UTF-8, and detector rules are Unicode-aware. Detection coverage is fixture-backed, but it is not a claim of full multilingual parity.
Header-based sensitive-field detection includes a maintained taxonomy for English, Dutch, German, French, Spanish, Portuguese, and Italian, plus a small Japanese pilot for unambiguous phone, address, name, and date headers. Header matching handles Unicode normalization, word segmentation, accent folding for Latin terms, camelCase splitting, compact aliases such as apikey, homephone, and person_id, and conservative fuzzy matching for longer taxonomy terms with sample-value confirmation.
Value validators run independently of header language for structured values such as email, UUID, IP address, URL, MAC address, IBAN, payment cards, VAT IDs, Dutch BTW/omzetbelastingnummer, US SSN/EIN, and formatted phone numbers. Dutch BTW values without an NL prefix are detected only under Dutch BTW header context.
Smart replacement is optional and off by default. It is designed for columns where rule-based masking is too mechanical and you want more realistic fake values.
The first implementation uses:
- Ollama running on
localhost gemma3:4bas the lightweight default model- In-app status checks, setup link, model download, progress, and cancel controls
Usage:
- Install or start Ollama.
- In CSV Anonymizer, open Local AI setup when Smart replacement prompts for it.
- Download
gemma3:4bfrom the app if it is not already available. - Select
Smart replacement (Local AI)for the columns that should use the model. - Review the preview, then run the transformation.
The app batches unique values per selected column, asks the local model for realistic fake replacements, validates the response, reuses accepted replacements for repeated source values within the current run, and falls back to rule-based pseudonymization when the model output is missing or unsafe.
Model weights and local runtime binaries are not bundled in the repository or desktop release. The first model download uses network access through Ollama. CSV values selected for Smart replacement are sent only to the configured local Ollama endpoint.
The standard workflow transforms selected values in place: CSV file output keeps the source rows and columns, while pasted structured or text workflows keep the original shape where possible. It redacts, masks, pseudonymizes, tokenizes, or locally replaces selected values. It reduces exposure, but the output is still transformed source data, not guaranteed anonymous data.
It does not produce formal anonymity, differential privacy aggregates, or synthetic datasets. Review previews and privacy reports before sharing generated files.
| Strategy | Use |
|---|---|
| Redact | Replace values with typed placeholders such as [EMAIL], [PERSON], or [DATE]. |
| Mask | Replace values with simple masked output. |
| Pseudonymize | Generate readable or shape-preserving fake values. |
| Tokenize | Replace values with opaque tok_... tokens that stay consistent within the current run. |
| Smart replacement (Local AI) | Use a local LLM through Ollama for more realistic fake replacements. |
| Pass through | Leave values unchanged. |
Examples of format preservation include email domains, UUID shape, timestamp precision, numeric width and decimals, phone separators, and full-name token count.
Download desktop builds from GitHub Releases.
macOS:
- Download the
.dmgfor your Mac. - Use
aarch64for Apple Silicon andx64for Intel. - Drag the app into Applications.
Linux:
- Download the
.AppImage,.deb, or.rpmfrom the latest release. - For direct downloads, also download the matching
.sha256and.sha256.ascfiles and verify them with the release signing key (csv-anonymizer-archive-keyring.pgp) before installing. - Debian/Ubuntu users can enable the signed APT repository:
bash <(curl -fsSL https://ddv1982.github.io/csv-data-anonymizer/install-apt-repo.sh)
sudo apt update
sudo apt install csv-anonymizerAfter the repository is enabled, normal sudo apt update and sudo apt upgrade runs handle updates.
Requirements:
- Rust stable
- Node.js 22.13 or newer
- Frontend dependencies from
frontend/package-lock.json - Playwright Chromium for browser e2e checks:
cd frontend && npx playwright install chromium
Setup:
npm ci --prefix frontendRun the desktop app:
npm run tauri:devUseful checks:
npm run typecheck
npm run lint
npm run test
npm run fmt
npm run deadcode:required
npm run docs:check
npm run release:check
npm run tauri:prebuilt:check
npm run artifacts:rust:check
npm run linux:package-manager:check
npm run frontend:e2e
npm run frontend:a11y
npm run frontend:audit
npm run cargo:audit
cargo bench -p csv-anonymizer-core --bench csv_streaming
cargo bench -p csv-anonymizer-core --bench detector_matrix -- --sample-size 10
node scripts/rust-smoke.mjsThe root lint, test, typecheck, fmt, docs:check, and deadcode:required scripts are the canonical local gates. The dead-code scans use Knip for the frontend and cargo-machete for Rust dependency drift, and the weekly GitHub Actions maintenance workflow runs the same required dead-code gate. The detector matrix benchmark measures the built-in detector only; the external PII library comparison is archived in docs/detector-library-evaluation.md.
frontend- React/Vite desktop UI.src-tauri- Tauri shell, app settings, commands, background jobs, and Ollama integration.crates/csv-anonymizer-core- CSV detection, preview, transformation, reporting, and tests.crates/csv-anonymizer-app- lightweight CLI smoke harness for the shared core.build- package metadata, icons, and platform assets.scripts- release, packaging, metadata, APT, and smoke-test tooling.
Release steps and signing requirements are documented in docs/releasing.md.