An intelligent assistant that analyzes policy and legal clauses using RAG, a custom ML risk classifier, and LLM-powered explanations.
PolicyWise is an AI-powered tool that helps Compliance and Legal teams quickly evaluate policy or legal clauses.
It combines:
- RAG (Retrieval-Augmented Generation) → Finds relevant text inside uploaded PDF policies
- Machine Learning Classifier → Predicts if a clause is COMPLIANT or RISKY
- LLM Explanation (OpenAI) → Gives clear explanations and safer rewrites
This makes PolicyWise a smart internal assistant for reviewing documents.
Upload PDF policy documents.
PolicyWise will:
- Extract text
- Break it into chunks
- Create embeddings
- Use FAISS to retrieve the most relevant sections
A Logistic Regression + TF-IDF classifier trained by me.
It predicts:
- COMPLIANT
- RISKY
With a confidence score.
If an OpenAI key is provided, PolicyWise can:
- Explain why a clause is risky
- Highlight dangerous wording
- Suggest a safer rewrite
- Use RAG + ML to give better, more contextual answers
policy-wise/
│
├── app.py # Main Streamlit application
├── train_model.py # Training script for ML classifier
├── requirements.txt # Project dependencies
├── README.md # Documentation
│
├── policy_model.pkl # (Optional) Saved ML classifier
├── policy_vectorizer.pkl # (Optional) Saved TF-IDF vectorizer
│
├── assets/
│ └── policywise.png # Screenshot for README
│
├── .streamlit/
│ └── config.toml # Technical blue theme for UI
│
├── LICENSE
└── .gitignore # Ignored files (venv, .env, cache, etc.)Here’s a compact high-level overview of how PolicyWise processes, analyzes, and evaluates policy text:
🧑💻 User (Streamlit UI) → 📄 PDF Processing (Extract + Chunk + Embed)
→ 🔍 FAISS Search (RAG) → 🛡️ ML Classifier (TF-IDF + LR) → 📤 Final Outputpython -m venv venv
venv\Scripts\activatepip install -r requirements.txtpython train_model.pystreamlit run app.py-
📄 Upload Policy PDFs
- Extract text using PyPDF
- Split into overlapping chunks
- Create embeddings (OpenAI)
- Store vectors in FAISS index
-
✍️ Enter a Clause
- Convert clause → embedding
- Search FAISS for top-matching policy snippets (RAG)
-
🛡️ ML Risk Classification
- TF-IDF vectorizer transforms text
- Logistic Regression predicts: → ✅ COMPLIANT → ❌ RISKY
- Outputs label + confidence score
-
🤖 LLM Review
- Combine: user clause + retrieved policy snippets + ML output
- AI generates:
- Explanation of risk
- Highlighted vague phrases
- A safer rewritten version
-
📤 Final Output
- ML prediction
- Relevant policy snippets (RAG)
- LLM explanation + rewrite
| Technology | Purpose |
|---|---|
| 🎨 Streamlit | UI & user interaction |
| 🐍 Python | Core language |
| Technology | Purpose |
|---|---|
| 📄 PyPDF | Extract text from PDFs |
| ✂️ Custom Chunking | Split policy text into chunks |
| 🧠 OpenAI Embeddings | Convert text into vectors |
| 🗃️ FAISS Vector DB | Fast semantic search (RAG) |
| 📊 Scikit-learn | ML toolkit |
| 🧩 TF-IDF Vectorizer | Transform text for ML model |
| 🛡️ Logistic Regression | Classify COMPLIANT / RISKY |
| Technology | Purpose |
|---|---|
| 🧠 OpenAI Chat Models | Explanation + safer rewrite |
| 🔍 RAG Pipeline | Retrieve relevant policy snippets |
| Technology | Purpose |
|---|---|
| 🔑 Python-dotenv | Load environment variables |
| 💾 Pickle | Save model & vectorizer |
| 🔢 NumPy | Numerical operations |
- Helps Legal teams quickly evaluate compliance risks
- Reduces manual effort in reviewing internal policies
- Uses a hybrid AI system (RAG + ML + LLM), similar to real enterprise tools
- Demonstrates applied knowledge of NLP, vector search, and model pipelines
- Add LegalBERT for deeper clause understanding
- Add metadata-based RAG (policy titles, categories)
- Deploy with Docker / Streamlit Cloud / HuggingFace
- Add authentication for internal company use
- Add clause history + downloadable reports
