Skip to content

langchain-samples/document-rag-multi-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Document RAG Analysis

A multi-agent RAG system that extracts information from documents, generates summaries, and fact-checks the output against the source material. Built with the Deep Agents framework.

Architecture

document-rag-analysis/
├── agents/
│   ├── extractor.py      — analyze_document tool (index, retrieve, deduplicate, summarize)
│   ├── fact_checker.py   — verify_claims tool (cross-checks summary against source chunks)
│   └── orchestrator.py   — Deep Agent wiring
├── main.py               — CLI entry point
├── requirements.txt
└── .env.example
Agent Role Model
Orchestrator Calls analyze_document directly, then delegates to fact-checker subagent claude-sonnet-4-6
fact-checker subagent Verifies each claim in the summary against retrieved source passages claude-haiku-4-5-20251001

Pipeline

Document path
    │
    ▼
analyze_document (extractor.py)
    ├── Index once → Chroma (cached across runs)
    ├── 2 broad similarity searches
    ├── Deduplicate chunks (Jaccard)
    └── Summarize in one LLM call
    │
    ▼
verify_claims (fact_checker.py)
    ├── Re-retrieve same chunks from Chroma
    └── Per-claim verdict: ✅ Supported / ⚠️ Partially / ❌ Not Found
    │
    ▼
Summary + Fact-check report

OpenAI is used only for embeddings (text-embedding-3-small). All agent reasoning uses Claude. Vectorstore operations are excluded from LangSmith traces to stay under the 20 MB payload limit; LLM calls are fully traced.

Setup

pip install -r requirements.txt
cp .env.example .env   # fill in ANTHROPIC_API_KEY + OPENAI_API_KEY

Usage

# Executive summary + fact-check (default)
python main.py ./docs/report.pdf

# Bullet summary with a custom extraction query
python main.py ./docs/ --query "What are the risk factors?" --summary-type bullet

# Detailed summary of a specific file
python main.py ./docs/paper.pdf --summary-type detailed

Options

Flag Default Description
document_path required Path to a PDF, .txt file, or directory of documents
--query "Extract all key information..." What to extract from the documents
--summary-type executive executive, detailed, or bullet
--thread-id auto-generated Session ID for conversation continuity

Test Documents

A download script is included to fetch a few public arXiv papers into ./docs/:

python download_test_docs.py
File Paper
attention_is_all_you_need.pdf Attention Is All You Need (Transformer)
bert.pdf BERT: Pre-training of Deep Bidirectional Transformers
gpt3.pdf Language Models are Few-Shot Learners (GPT-3)
llama.pdf LLaMA: Open and Efficient Foundation Language Models
# Summarize a single paper
python main.py ./docs/attention_is_all_you_need.pdf --summary-type detailed

# Query across all four papers at once
python main.py ./docs/ --query "What architecture or training techniques are proposed?" --summary-type bullet

Environment Variables

Copy .env.example to .env and fill in your keys. The .env file is gitignored and will never be committed.

ANTHROPIC_API_KEY=your-anthropic-key
OPENAI_API_KEY=your-openai-key

# LangSmith tracing (optional)
LANGSMITH_TRACING=true
LANGSMITH_API_KEY=your-langsmith-key
LANGSMITH_PROJECT=document-rag-analysis

About

Sample document rag and summarizer agent

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages