Skip to content

nirjanadas/policy-wise

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PolicyWise Screenshot

🛡️ PolicyWise — AI Legal Policy Assistant

An intelligent assistant that analyzes policy and legal clauses using RAG, a custom ML risk classifier, and LLM-powered explanations.


✨ What is PolicyWise?

PolicyWise is an AI-powered tool that helps Compliance and Legal teams quickly evaluate policy or legal clauses.

It combines:

  • RAG (Retrieval-Augmented Generation) → Finds relevant text inside uploaded PDF policies
  • Machine Learning Classifier → Predicts if a clause is COMPLIANT or RISKY
  • LLM Explanation (OpenAI) → Gives clear explanations and safer rewrites

This makes PolicyWise a smart internal assistant for reviewing documents.


🚀 Features

🔍 1. Document Search (RAG)

Upload PDF policy documents.
PolicyWise will:

  • Extract text
  • Break it into chunks
  • Create embeddings
  • Use FAISS to retrieve the most relevant sections

🛡️ 2. Risk Classifier (ML Model)

A Logistic Regression + TF-IDF classifier trained by me.
It predicts:

  • COMPLIANT
  • RISKY

With a confidence score.

🤖 3. AI Explanation (LLM-Enhanced)

If an OpenAI key is provided, PolicyWise can:

  • Explain why a clause is risky
  • Highlight dangerous wording
  • Suggest a safer rewrite
  • Use RAG + ML to give better, more contextual answers

📁 Project Structure

policy-wise/
│
├── app.py                 # Main Streamlit application
├── train_model.py         # Training script for ML classifier
├── requirements.txt       # Project dependencies
├── README.md              # Documentation
│
├── policy_model.pkl       # (Optional) Saved ML classifier
├── policy_vectorizer.pkl  # (Optional) Saved TF-IDF vectorizer
│
├── assets/
│   └── policywise.png     # Screenshot for README
│
├── .streamlit/
│   └── config.toml        # Technical blue theme for UI
│
├── LICENSE
└── .gitignore             # Ignored files (venv, .env, cache, etc.)

🧱 Architecture

Here’s a compact high-level overview of how PolicyWise processes, analyzes, and evaluates policy text:

🧑‍💻 User (Streamlit UI) → 📄 PDF Processing (Extract + Chunk + Embed)
→ 🔍 FAISS Search (RAG) → 🛡️ ML Classifier (TF-IDF + LR) → 📤 Final Output

🛠️ Installation

1️⃣ Create a virtual environment

python -m venv venv
venv\Scripts\activate

2️⃣ Install dependencies

pip install -r requirements.txt

3️⃣ Train the ML model

python train_model.py

4️⃣ Run the Streamlit application

streamlit run app.py

🚀 HOW IT WORKS


  1. 📄 Upload Policy PDFs

    • Extract text using PyPDF
    • Split into overlapping chunks
    • Create embeddings (OpenAI)
    • Store vectors in FAISS index
  2. ✍️ Enter a Clause

    • Convert clause → embedding
    • Search FAISS for top-matching policy snippets (RAG)
  3. 🛡️ ML Risk Classification

    • TF-IDF vectorizer transforms text
    • Logistic Regression predicts: → ✅ COMPLIANT → ❌ RISKY
    • Outputs label + confidence score
  4. 🤖 LLM Review

    • Combine: user clause + retrieved policy snippets + ML output
    • AI generates:
      • Explanation of risk
      • Highlighted vague phrases
      • A safer rewritten version
  5. 📤 Final Output

    • ML prediction
    • Relevant policy snippets (RAG)
    • LLM explanation + rewrite

🧰 TECH STACK


FRONTEND 🖥️

Technology Purpose
🎨 Streamlit UI & user interaction
🐍 Python Core language

BACKEND / PROCESSING ⚙️

Technology Purpose
📄 PyPDF Extract text from PDFs
✂️ Custom Chunking Split policy text into chunks
🧠 OpenAI Embeddings Convert text into vectors
🗃️ FAISS Vector DB Fast semantic search (RAG)
📊 Scikit-learn ML toolkit
🧩 TF-IDF Vectorizer Transform text for ML model
🛡️ Logistic Regression Classify COMPLIANT / RISKY

AI LAYER 🤖

Technology Purpose
🧠 OpenAI Chat Models Explanation + safer rewrite
🔍 RAG Pipeline Retrieve relevant policy snippets

UTILITIES 🔧

Technology Purpose
🔑 Python-dotenv Load environment variables
💾 Pickle Save model & vectorizer
🔢 NumPy Numerical operations

💡 Why PolicyWise Matters

  • Helps Legal teams quickly evaluate compliance risks
  • Reduces manual effort in reviewing internal policies
  • Uses a hybrid AI system (RAG + ML + LLM), similar to real enterprise tools
  • Demonstrates applied knowledge of NLP, vector search, and model pipelines

🚧 Future Enhancements

  • Add LegalBERT for deeper clause understanding
  • Add metadata-based RAG (policy titles, categories)
  • Deploy with Docker / Streamlit Cloud / HuggingFace
  • Add authentication for internal company use
  • Add clause history + downloadable reports

About

A RAG-based compliance assistant that checks AI/data scenarios against company policies.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages