CSV Anonymizer

CSV Anonymizer is a local-first desktop app for reducing sensitive CSV and pasted-data exposure before sharing, testing, demos, or support work. It detects likely personal data, previews transformations, and writes protected output while preserving the original structure where possible.

All non-LLM detection and transformation runs locally in Rust. Optional local LLM replacement also runs on your machine through Ollama.

Read the generated project wiki at github.com/ddv1982/csv-data-anonymizer/wiki.

What It Does

Detects common sensitive fields: emails, names, phone numbers, UUIDs, timestamps, numeric IDs, addresses, postal codes, IPs, URLs, MAC addresses, tax IDs, VAT/BTW numbers, and more.
Auto-selects high and medium risk columns while still letting you choose exactly which columns to transform.
Shows a preview before writing output. Rule-based preview replacements are examples; final output gets its own randomized run.
Streams CSV file transformations instead of loading the whole file into memory.
Supports lightweight paste workflows for CSV, JSON, XML, YAML, plain text, and logs up to 5 MiB; larger CSV inputs should use the streaming file workflow.
Includes Quick by Data Type generation for creating protected sample values without first providing input data.
Keeps repeated source values consistent within each run.
Offers optional Smart replacement with a local LLM for selected columns.
Produces a privacy report with transformed column counts, redaction counts, reused values, token counts, Local AI replacement counts, and fallbacks.

Detection Language Coverage

The app UI is currently English. CSV and pasted values are read as UTF-8, and detector rules are Unicode-aware. Detection coverage is fixture-backed, but it is not a claim of full multilingual parity.

Header-based sensitive-field detection includes a maintained taxonomy for English, Dutch, German, French, Spanish, Portuguese, and Italian, plus a small Japanese pilot for unambiguous phone, address, name, and date headers. Header matching handles Unicode normalization, word segmentation, accent folding for Latin terms, camelCase splitting, compact aliases such as apikey, homephone, and person_id, and conservative fuzzy matching for longer taxonomy terms with sample-value confirmation.

Value validators run independently of header language for structured values such as email, UUID, IP address, URL, MAC address, IBAN, payment cards, VAT IDs, Dutch BTW/omzetbelastingnummer, US SSN/EIN, and formatted phone numbers. Dutch BTW values without an NL prefix are detected only under Dutch BTW header context.

Local LLM Smart Replacement

Smart replacement is optional and off by default. It is designed for columns where rule-based masking is too mechanical and you want more realistic fake values.

The first implementation uses:

Ollama running on localhost
gemma3:4b as the lightweight default model
In-app status checks, setup link, model download, progress, and cancel controls

Usage:

Install or start Ollama.
In CSV Anonymizer, open Local AI setup when Smart replacement prompts for it.
Download gemma3:4b from the app if it is not already available.
Select Smart replacement (Local AI) for the columns that should use the model.
Review the preview, then run the transformation.

The app batches unique values per selected column, asks the local model for realistic fake replacements, validates the response, reuses accepted replacements for repeated source values within the current run, and falls back to rule-based pseudonymization when the model output is missing or unsafe.

Model weights and local runtime binaries are not bundled in the repository or desktop release. The first model download uses network access through Ollama. CSV values selected for Smart replacement are sent only to the configured local Ollama endpoint.

Privacy Boundary

The standard workflow transforms selected values in place: CSV file output keeps the source rows and columns, while pasted structured or text workflows keep the original shape where possible. It redacts, masks, pseudonymizes, tokenizes, or locally replaces selected values. It reduces exposure, but the output is still transformed source data, not guaranteed anonymous data.

It does not produce formal anonymity, differential privacy aggregates, or synthetic datasets. Review previews and privacy reports before sharing generated files.

Strategies

Strategy	Use
Redact	Replace values with typed placeholders such as `[EMAIL]`, `[PERSON]`, or `[DATE]`.
Mask	Replace values with simple masked output.
Pseudonymize	Generate readable or shape-preserving fake values.
Tokenize	Replace values with opaque `tok_...` tokens that stay consistent within the current run.
Smart replacement (Local AI)	Use a local LLM through Ollama for more realistic fake replacements.
Pass through	Leave values unchanged.

Examples of format preservation include email domains, UUID shape, timestamp precision, numeric width and decimals, phone separators, and full-name token count.

Install

Download desktop builds from GitHub Releases.

macOS:

Download the .dmg for your Mac.
Use aarch64 for Apple Silicon and x64 for Intel.
Drag the app into Applications.

Linux:

Download the .AppImage, .deb, or .rpm from the latest release.
For direct downloads, also download the matching .sha256 and .sha256.asc files and verify them with the release signing key (csv-anonymizer-archive-keyring.pgp) before installing.
Debian/Ubuntu users can enable the signed APT repository:

bash <(curl -fsSL https://ddv1982.github.io/csv-data-anonymizer/install-apt-repo.sh)
sudo apt update
sudo apt install csv-anonymizer

After the repository is enabled, normal sudo apt update and sudo apt upgrade runs handle updates.

Development

Requirements:

Rust stable
Node.js 22.13 or newer
Frontend dependencies from frontend/package-lock.json
Playwright Chromium for browser e2e checks: cd frontend && npx playwright install chromium

Setup:

npm ci --prefix frontend

Run the desktop app:

npm run tauri:dev

Useful checks:

npm run typecheck
npm run lint
npm run test
npm run fmt
npm run deadcode:required
npm run docs:check
npm run release:check
npm run tauri:prebuilt:check
npm run artifacts:rust:check
npm run linux:package-manager:check
npm run frontend:e2e
npm run frontend:a11y
npm run frontend:audit
npm run cargo:audit
cargo bench -p csv-anonymizer-core --bench csv_streaming
cargo bench -p csv-anonymizer-core --bench detector_matrix -- --sample-size 10
node scripts/rust-smoke.mjs

The root lint, test, typecheck, fmt, docs:check, and deadcode:required scripts are the canonical local gates. The dead-code scans use Knip for the frontend and cargo-machete for Rust dependency drift, and the weekly GitHub Actions maintenance workflow runs the same required dead-code gate. The detector matrix benchmark measures the built-in detector only; the external PII library comparison is archived in docs/detector-library-evaluation.md.

Project Layout

frontend - React/Vite desktop UI.
src-tauri - Tauri shell, app settings, commands, background jobs, and Ollama integration.
crates/csv-anonymizer-core - CSV detection, preview, transformation, reporting, and tests.
crates/csv-anonymizer-app - lightweight CLI smoke harness for the shared core.
build - package metadata, icons, and platform assets.
scripts - release, packaging, metadata, APT, and smoke-test tooling.

Release steps and signing requirements are documented in docs/releasing.md.

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
.github		.github
build		build
crates		crates
docs		docs
frontend		frontend
scripts		scripts
src-tauri		src-tauri
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
knip.json		knip.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CSV Anonymizer

What It Does

Detection Language Coverage

Local LLM Smart Replacement

Privacy Boundary

Strategies

Install

Development

Project Layout

About

Uh oh!

Releases 57

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

CSV Anonymizer

What It Does

Detection Language Coverage

Local LLM Smart Replacement

Privacy Boundary

Strategies

Install

Development

Project Layout

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 57

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages