Production-style fullstack system for detecting and redacting Personally Identifiable Information from document images using a hybrid AI pipeline.
- End-to-end workflow: authentication, upload, detection, decisioning, and redacted output.
- Multi-stage AI pipeline: OCR + regex + NER + policy-aware decisions.
- Fullstack delivery: Flutter client and Flask backend in one repository.
- Audit-focused: redaction results and processing logs are available in the app.
PII_FULLSTACK/
├─ PII/ # Main application
│ ├─ lib/ # Flutter app (UI + state + services)
│ ├─ modules/ # AI pipeline modules (OCR, regex, NER, hybrid, RAG, redaction)
│ ├─ app.py # Flask backend entry point
│ ├─ requirements.txt # Python dependencies
│ ├─ pubspec.yaml # Flutter dependencies
│ ├─ RUNNING_GUIDE.md
│ └─ TESTING_GUIDE.md
├─ README.md # This file
└─ LICENSE
Flutter Client
|
| HTTP API
v
Flask Backend (app.py)
|
+--> OCR Engine (Tesseract + OpenCV)
+--> Regex Detector
+--> NER Detector (SpaCy)
+--> Hybrid Fusion Engine
+--> Policy Decision Engine (RAG/FAISS fallback-aware)
+--> Redaction Engine (text + image)
|
+--> MySQL (users, auth, audit logs)
git clone https://github.com/Da-ya7/PII_FULLSTACK.git
cd PII_FULLSTACK/PIIpython -m venv .venv
# Windows
.venv\Scripts\activate
pip install -r requirements.txt
python -m spacy download en_core_web_smcopy .env.example .envUpdate .env with your local configuration values.
python app.pyBackend health endpoint:
http://127.0.0.1:5000/api/health
flutter pub get
flutter run -d chrome --dart-define=API_BASE_URL=http://127.0.0.1:5000 --web-port=5080Client URL:
http://localhost:5080
- Secure user registration/login.
- Document upload and processing from Flutter UI.
- OCR extraction with bounding boxes.
- Structured PII detection via regex patterns.
- Contextual PII detection via NER.
- Hybrid confidence-aware fusion.
- Policy-driven action (full redact, partial mask, keep).
- Downloadable redacted results.
- Audit history for processed files.
- Frontend: Flutter, Provider, Material 3
- Backend: Flask, Flask-CORS, bcrypt
- AI/NLP: Tesseract OCR, OpenCV, SpaCy, sentence-transformers, FAISS
- Data: MySQL
- App-level README: PII/README.md
- Runtime guide: PII/RUNNING_GUIDE.md
- Testing guide: PII/TESTING_GUIDE.md
.env, local uploads, generated build artifacts, and dataset folders are ignored via.gitignore.- Use non-production credentials locally and rotate secrets before deployment.
Licensed under MIT. See LICENSE.