Skip to content

Willszs/PDF-reduction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Engineering PDF Redactor

Python desktop tool for anonymizing engineering drawing PDFs on macOS.

What It Does

  • Performs true PDF redaction with PyMuPDF (add_redact_annot + apply_redactions)
  • Removes content inside redaction regions for text, images, and vector graphics
  • Supports batch processing from GUI or CLI
  • Writes outputs to _prodc/ by default
  • Prevents accidental overwrite of the input file (uses *_redacted.pdf when needed)

Available Template Options

The GUI now exposes only these three options:

  • default_auto
  • templateA
  • templateB

default_auto is built in (no JSON file required).
templateA and templateB map to Template A and Template B JSON files.

Redaction Logic (Current Implementation)

For each page, the engine combines:

  • Fixed template regions (regions)
  • Regex-based text matches (text_patterns)
  • Optional auto-detection (auto_detect)
    • keyword word-matching (keywords)
    • corner image scan with area safety cap
    • corner vector scan with area safety cap

Then it applies true redaction and saves with cleanup flags (garbage=4, clean=True).

Requirements

  • macOS
  • Python 3.10+

Install

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Optional drag-and-drop support:

  • Install tkinterdnd2 (already listed in requirements.txt)
  • If unavailable, the app still works via Add PDFs

Run GUI

python3 app.py

Workflow:

  1. Select template (default_auto, templateA, or templateB)
  2. Add one or more PDF files
  3. Click Process
  4. Check outputs in _prodc/ (or your selected output folder)

Run CLI

python3 app.py --cli \
  --input /path/a.pdf /path/b.pdf \
  --template /Users/shuzishuai/reduction/templates/templateA.json \
  --output-dir /path/_prodc

Notes:

  • If --template is omitted in CLI, built-in default_auto is used
  • If --output-dir is omitted, output goes to each source file's sibling _prodc/

Tkinter on Homebrew Python

If GUI launch fails with No module named '_tkinter', install Tk support:

brew install python-tk@3.13
deactivate  # if venv is active
rm -rf .venv
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Validation (Acrobat Acceptance Check)

In Acrobat Pro:

  1. Open processed PDF
  2. Use Select All Objects
  3. Verify sensitive regions have no recoverable selectable objects

Project Files

  • app.py: GUI + CLI entry
  • redaction_engine.py: core redaction engine
  • templates/templateA.json: Template A
  • templates/templateB.json: Template B
  • requirements.txt: dependencies

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages