Epstein Suite

A searchable transparency database for DOJ Epstein Files, providing public access to documents, emails, flight logs, and AI-powered search.

🌐 Live Site: epsteinsuite.com

📋 Table of Contents

About
Features
Screenshots
Tech Stack
Getting Started
Database Schema
Data Pipeline
Contributing
License
Acknowledgments

🎯 About

Epstein Suite is a web application that makes public records related to the Epstein case searchable and accessible. It combines traditional document management with modern AI-powered features to help researchers, journalists, and the public explore this important dataset.

Why This Exists

After the DOJ released thousands of pages of Epstein-related documents, they were difficult to search and navigate. This project makes them:

✅ Searchable - Full-text search across all documents and OCR'd pages
✅ Organized - Entity extraction (people, organizations, locations)
✅ Interactive - AI-powered Q&A interface
✅ Transparent - Open source code, public mission

Production Site Stats

The live site at epsteinsuite.com currently indexes:

4,700+ documents from DOJ, FBI Vault, House Oversight
Millions of OCR'd pages with full-text search
Thousands of extracted entities (people, organizations, locations)
Flight logs with geographic mapping
Email threads with relationship analysis

✨ Features

Search & Discovery

Full-Text Search - MySQL FULLTEXT search across documents and OCR'd pages
Entity Browser - Explore people, organizations, and locations
Advanced Filters - Filter by source, date, file type, status
Document Timeline - Chronological view of documents

AI-Powered Features

Ask AI - Natural language Q&A powered by OpenAI GPT-5-nano
Document Summaries - AI-generated summaries for complex documents
Entity Extraction - Automatic identification of key people/organizations
Semantic Search - Vector embeddings for similarity-based search

Specialized Views

Flight Logs - Searchable flight manifests with map visualization
Email Client - Thread view for email collections
Photo Gallery - Media browser with metadata
Network Graphs - Entity relationship visualizations with D3.js

Technical Features

File-Based Caching - Fast page loads with intelligent cache invalidation
Responsive Design - Mobile-first UI with TailwindCSS
Admin Dashboard - Operations console for monitoring
API Endpoints - JSON APIs for integrations

📸 Screenshots

Note: Add screenshots of your live site here once you've added them to the repo

🛠️ Tech Stack

Backend

PHP 8.4 - Strict typing, PSR-12 coding standard
MySQL 8.0 - InnoDB engine, FULLTEXT indexes, utf8mb4
Apache/PHP-FPM - Production web server

Frontend

TailwindCSS 3.4 - Utility-first CSS framework
Vanilla JavaScript - No framework dependencies
D3.js - Network graph visualizations
Leaflet.js - Flight log mapping

AI/ML

OpenAI GPT-5-nano - Document summaries, entity extraction, Q&A
text-embedding-3-small - Vector embeddings for semantic search
PHP Vector Search - Cosine similarity computed in PHP

Architecture

Flat PHP Routing - No framework, direct file-to-URL mapping
PDO Singleton - Centralized database access
File-Based Cache - Simple, fast caching layer

🚀 Getting Started

Prerequisites

PHP 8.4 or higher
MySQL 8.0 or higher
Git
(Optional) OpenAI API key for AI features

Installation

Clone the repository

git clone https://github.com/YOUR_USERNAME/epstein-suite.git
cd epstein-suite

Create database

mysql -u root -p
CREATE DATABASE epstein_db CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
exit

Import database schema

mysql -u root -p epstein_db < config/schema.sql

Configure environment

cp .env.example .env
# Edit .env with your settings
nano .env

Minimum required settings:

DB_HOST=localhost
DB_NAME=epstein_db
DB_USERNAME=root
DB_PASSWORD=your_password
ADMIN_PASSWORD=your_admin_password

Start development server
```
php -S localhost:8000
```
Visit http://localhost:8000

Development with Sample Data

The production database is not included. For local development, you can:

Option A: Work with empty database

Good for UI/UX development
Test edge cases with no data

Option B: Create test data

-- Create sample documents
INSERT INTO documents (title, description, status, file_url, source, created_at) VALUES
('Sample Document 1', 'A test document for development', 'processed', 'https://example.com/doc1.pdf', 'TEST', NOW()),
('Sample Document 2', 'Another test document', 'processed', 'https://example.com/doc2.pdf', 'TEST', NOW());

-- Create sample entities
INSERT INTO entities (name, type, created_at) VALUES
('John Doe', 'PERSON', NOW()),
('Acme Corporation', 'ORG', NOW()),
('New York', 'LOCATION', NOW());

-- Link entities to documents
INSERT INTO document_entities (document_id, entity_id) VALUES
(1, 1), (1, 2), (2, 1), (2, 3);

Production Deployment

For production deployment:

Configure .env with production credentials
Set up Apache/Nginx with PHP-FPM
Enable HTTPS with Let's Encrypt
Configure file permissions:
```
chmod 755 *.php
chmod 775 cache/
```
Set up admin authentication (HTTP Basic Auth)
Configure caching headers in .htaccess

See TECH.md for detailed production setup (if you include it).

🗄️ Database Schema

The application uses these core tables:

Documents (`documents`)

Main document metadata with full lifecycle tracking:

- id, title, description, file_url, source
- status (pending → downloaded → processed)
- ai_summary, created_at, updated_at

Pages (`pages`)

OCR text per page with FULLTEXT index:

- document_id, page_number, ocr_text
- FULLTEXT INDEX(ocr_text)

Entities (`entities`)

People, organizations, locations:

- id, name, type (PERSON/ORG/LOCATION)
- created_at

Relationships (`document_entities`)

Many-to-many document-entity relationships:

- document_id, entity_id

Other Tables

emails - Email threads (FULLTEXT indexed)
flight_logs - Flight manifest records
passengers - Flight passenger details
ai_sessions - AI chat session tracking
ai_messages - AI conversation history
ai_citations - Document citations in AI responses

See config/schema.sql for complete schema definition.

🔄 Data Pipeline

Note: The data ingestion pipeline is not included in this open source release.

The production site uses a proprietary pipeline that:

Discovers documents from DOJ, FBI Vault, and House Oversight sources
Downloads and OCRs PDF documents (pdf2image + Tesseract)
Generates AI summaries and extracts entities (OpenAI GPT-5-nano)
Analyzes flight logs for significance scoring
Generates vector embeddings for semantic search

To use this application with your own data, you'll need to:

Populate the documents table with your dataset
Run your own OCR/processing pipeline
Generate AI summaries and entity extractions
Follow the database schema in config/schema.sql

The web application works with any dataset that follows the schema structure.

🤝 Contributing

We welcome contributions! See CONTRIBUTING.md for detailed guidelines.

Quick Start

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Make your changes
Test locally: php -S localhost:8000
Commit: git commit -m 'Add amazing feature'
Push: git push origin feature/amazing-feature
Open a Pull Request

Areas We Need Help

🎨 UI/UX improvements (mobile, accessibility, dark mode)
🔍 Search enhancements (better filters, faceted search)
⚡ Performance optimizations
📊 Data visualizations
🧪 Automated testing (we have none!)
📚 Documentation improvements

Code Standards

Follow PSR-12 coding standard
Use strict typing: declare(strict_types=1);
Always use PDO prepared statements
Test your changes locally

📜 License

This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).

This means:

✅ You can use, modify, and distribute this code
✅ If you run a modified version as a web service, you must release your source code
✅ All derivative works must also be AGPL-3.0 licensed
✅ You must credit the original project

Data Pipeline Exception: The data ingestion pipeline, scrapers, and automation scripts are proprietary and not included in this release.

See LICENSE for full details.

👤 Author

Kevin Champlin

Website: kevinchamplin.com
Email: info@epsteinsuite.com
Production Site: epsteinsuite.com

🙏 Acknowledgments

DOJ, FBI, and House Oversight for releasing these public records
OpenAI for GPT-5 and embedding APIs
The open source community for tools like TailwindCSS, D3.js, and Leaflet.js
Tesseract OCR project
Everyone committed to transparency and accountability

📞 Support

Bug Reports: Open an issue
Feature Requests: Open an issue with [FEATURE] tag
Security Issues: Email info@epsteinsuite.com privately
General Questions: GitHub Discussions

🔒 Privacy & Safety

This project is designed with strict privacy protections:

AI prompts explicitly forbid un-redacting victim names
All data sources are already-public records
Focus is on investigative leads, not victim identification
Victim privacy is paramount

⭐ Star History

If you find this project useful, please consider giving it a star on GitHub!

Built for transparency. Designed for accountability.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github		.github
.well-known		.well-known
admin		admin
api		api
config		config
includes		includes
migrations		migrations
.env.example		.env.example
.gitignore		.gitignore
.htaccess		.htaccess
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Epsten-Suite-Search-Files-Easily.png		Epsten-Suite-Search-Files-Easily.png
LICENSE		LICENSE
README.md		README.md
about.php		about.php
admin.php		admin.php
advertising.php		advertising.php
ai_summary_status.php		ai_summary_status.php
ask.php		ask.php
business.php		business.php
chatroom.php		chatroom.php
contact.php		contact.php
contacts.php		contacts.php
document.php		document.php
drive.php		drive.php
efta_release.php		efta_release.php
email_client.php		email_client.php
email_client_new.php		email_client_new.php
entities.php		entities.php
entity.php		entity.php
flight_logs.php		flight_logs.php
index.php		index.php
ingestion_progress.php		ingestion_progress.php
insights.php		insights.php
jeffrey-epstein.png		jeffrey-epstein.png
migrate.php		migrate.php
news.php		news.php
orders.php		orders.php
photos.php		photos.php
popular.php		popular.php
press_kit.php		press_kit.php
privacy.php		privacy.php
profile.php		profile.php
register_docs.php		register_docs.php
roadmap.php		roadmap.php
robots.txt		robots.txt
search.php		search.php
serve.php		serve.php
sitemap.php		sitemap.php
sources.php		sources.php
stats.php		stats.php
submit-evidence.php		submit-evidence.php
tech.php		tech.php
terms.php		terms.php
timeline.php		timeline.php
top_findings.php		top_findings.php
topic.php		topic.php
transparency.php		transparency.php
trending.php		trending.php
update_flight_schema.php		update_flight_schema.php
video_thumb.php		video_thumb.php

Uh oh!

License

Kevinchamplin/epsteinsuite

Folders and files

Latest commit

History

Repository files navigation

Epstein Suite

📋 Table of Contents

🎯 About

Why This Exists

Production Site Stats

✨ Features

Search & Discovery

AI-Powered Features

Specialized Views

Technical Features

📸 Screenshots

🛠️ Tech Stack

Backend

Frontend

AI/ML

Architecture

🚀 Getting Started

Prerequisites

Installation

Development with Sample Data

Production Deployment

🗄️ Database Schema

Documents (documents)

Pages (pages)

Entities (entities)

Relationships (document_entities)

Other Tables

🔄 Data Pipeline

🤝 Contributing

Quick Start

Areas We Need Help

Code Standards

📜 License

👤 Author

🙏 Acknowledgments

📞 Support

🔒 Privacy & Safety

⭐ Star History

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Contributors 2

Languages

Documents (`documents`)

Pages (`pages`)

Entities (`entities`)

Relationships (`document_entities`)

Packages