Skip to content

noviciusss/Argus

Repository files navigation

🔬 Argus — Autonomous Deep Research Engine

A production-grade multi-agent research pipeline that autonomously plans, researches, critiques, and synthesizes comprehensive cited reports from any research query.

Python 3.11 FastAPI LangGraph Docker License: MIT


What Is Argus?

Argus accepts a research query via REST API and runs a supervisor-orchestrated multi-agent pipeline that:

  1. Plans — breaks the query into focused sub-questions
  2. Researches — searches the web (Tavily), finds papers (ArXiv), and looks up background (Wikipedia)
  3. Critiques — reviews its own findings for gaps and loops back if needed
  4. Writes — synthesizes a structured markdown report with numbered citations

The entire pipeline runs asynchronously — the API returns a job_id immediately and the client polls for completion. Research jobs are persisted in SQLite. Every LLM call and agent turn is traced in LangSmith.


Architecture

User HTTP Request
      │
POST /research  (FastAPI — async, returns job_id immediately)
      │
Creates Job (UUID) ──► SQLite jobs table
      │ (background thread)
      ▼
┌─────────────────────── Supervisor Agent ───────────────────────┐
│  Reads state, decides next agent via Command(goto=...) routing  │
└────────┬───────────────────────────────────────────────────────┘
         │
         ▼ delegates to
┌──────────┐  ┌────────────┐  ┌──────────┐  ┌──────────────┐
│ Planner  │  │ Researcher │  │  Critic  │  │    Writer    │
│          │  │            │  │          │  │              │
│ Breaks   │  │ Tavily +   │  │ Reviews  │  │ Synthesizes  │
│ query    │  │ ArXiv +    │  │ gaps +   │  │ final        │
│ into     │  │ Wikipedia  │  │ requests │  │ markdown     │
│ sub-Qs   │  │            │  │ more     │  │ report with  │
│          │  │            │  │ research │  │ citations    │
└──────────┘  └────────────┘  └──────────┘  └──────────────┘
                                                     │
                                          Job result ──► SQLite
                                                     │
GET /jobs/{id}/status  ──► polls until complete
GET /jobs/{id}/result  ──► returns full markdown report
      │
LangSmith traces entire run (observability)
      │
Streamlit UI reads API ──► displays report

State Machine (LangGraph)

START
  │
[supervisor] ──► planner ──► [supervisor]
                                  │
                             researcher ──► [supervisor]
                                  │
                               critic ──► [supervisor]
                                  │
                    (gaps found AND iterations < 3?)
                         Yes ──► researcher (loop)
                         No  ──► writer ──► [supervisor] ──► END

The supervisor uses Command(goto=...) routing — the LLM decides the flow based on agent outputs, not hardcoded chains. research_iterations >= 3 is enforced in code as a hard safety cap regardless of LLM decisions, preventing infinite loops on the Groq free tier.

Two Persistence Layers

Layer 1 — Job Persistence (custom SQLite table)
  jobs table: job_id | query | depth | status | result | error | agent_turns | created_at | updated_at
  status flow: "pending" ──► "running" ──► "complete" | "failed"

Layer 2 — LangGraph Checkpoints (SqliteSaver)
  Saves graph state after every node execution
  thread_id = job_id (same UUID reused across both layers)

Both layers live in data/research.db. Swapping to PostgreSQL is a one-line change in src/persistence/db.py.


Tech Stack

Component Choice Why
Agent framework LangGraph supervisor pattern Native multi-agent, cyclic graph, checkpointing
LLM Groq — Llama 3.3 70B Versatile Free tier, 500+ tok/s, deterministic routing
Web search Tavily Semantic search with scored, cited results
Paper search ArXiv Direct library, rate-limit fix applied
General knowledge Wikipedia Fast encyclopedic background
REST API FastAPI + uvicorn Async-native, OpenAPI docs auto-generated
Async tasks FastAPI BackgroundTasks Zero extra deps, sufficient for single-user
Persistence SQLite + LangGraph SqliteSaver Zero infra, PostgreSQL-ready
Observability LangSmith Per-agent token counts, latency, tool traces
Containerization Docker + docker-compose Reproducible builds, Render-ready
Deployment Render (free tier) Live public URL
Rate limiting slowapi Prevents free-tier quota abuse from public endpoint
UI Streamlit Polls API, renders markdown report
Config python-dotenv Standard 12-factor app config

How This Differs From Single-Agent ReAct

This is architecturally different from a standard ReAct agent:

Dimension Single-Agent ReAct Argus (Multi-Agent Supervisor)
Pattern One LLM loop with tools Supervisor orchestrates 4 specialist agents
Agent count 1 5 (supervisor + planner + researcher + critic + writer)
Interface Streamlit only FastAPI REST API + Streamlit
Task model Synchronous, blocks Async jobs, non-blocking
Persistence Conversation history only Jobs table + LangGraph checkpoints
Critique loop None Critic agent identifies gaps, loops back
Output Chat reply Structured markdown report with citations
Control flow LLM tool_calls Command(goto=agent_name) routing
Observability None LangSmith traces every agent turn
Deployment Not containerized Dockerized, live on Render

Core skill demonstrated: ReAct = tool use. Supervisor = orchestration. Orchestration is the harder, more senior skill.


Project Structure

Argus/
├── .env                          # API keys — never commit
├── .env.example                  # Template — commit this
├── .gitignore
├── Dockerfile
├── docker-compose.yml
├── render.yaml                   # Render deployment config
├── requirements.txt
├── README.md
├── ARCHITECTURE.md
│
├── data/
│   └── research.db               # SQLite — auto-created on first run
│
└── src/
    ├── api/
    │   ├── main.py               # FastAPI app, CORS, lifespan startup
    │   ├── models.py             # Pydantic request/response models
    │   └── routes/
    │       ├── research.py       # POST /research, GET /jobs/{id}/status+result
    │       └── health.py         # GET /health — Render health check
    │
    ├── agents/
    │   ├── supervisor.py         # LLM routing via Command(goto=...)
    │   ├── planner.py            # Decomposes query into sub-questions
    │   ├── researcher.py         # Calls Tavily + ArXiv + Wikipedia
    │   ├── critic.py             # Identifies research gaps
    │   └── writer.py             # Synthesizes final markdown report
    │
    ├── graph/
    │   ├── state.py              # ResearchState TypedDict + add_messages reducer
    │   └── pipeline.py           # Builds + compiles LangGraph StateGraph
    │
    ├── tools/
    │   ├── tavily_tool.py        # Web search (Tavily)
    │   ├── arxiv_tool.py         # Paper search (ArXiv, rate-limit fix applied)
    │   └── wikipedia_tool.py     # Background knowledge (Wikipedia)
    │
    ├── persistence/
    │   ├── db.py                 # SQLite CRUD — create_job, update_job_status, get_job
    │   └── checkpointer.py       # LangGraph SqliteSaver
    │
    └── ui/
        └── streamlit_app.py      # Calls FastAPI REST API, polls + renders report

API Reference

POST /research

Submit a new research job. Returns immediately with a job_id.

// Request
{
  "query": "What are the latest breakthroughs in protein folding AI?",
  "depth": "standard"
}
// depth: "quick" (~20s, 2 sub-questions, web only)
//        "standard" (~45s, 3 sub-questions, web + arxiv + wikipedia)
//        "deep" (~90s, 5 sub-questions, all tools, more results)

// Response — 202 Accepted
{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "pending",
  "estimated_seconds": 45
}

GET /jobs/{job_id}/status

Poll for job progress.

{
  "job_id": "550e8400-...",
  "status": "running",
  "created_at": "2026-02-26T07:00:00Z",
  "updated_at": "2026-02-26T07:00:32Z"
}
// status values: pending | running | complete | failed

GET /jobs/{job_id}/result

Fetch the completed report.

{
  "job_id": "550e8400-...",
  "query": "What are the latest breakthroughs in protein folding AI?",
  "status": "complete",
  "report": "## Protein Folding AI: 2025-2026 Breakthroughs\n\n...",
  "sources": ["https://...", "https://arxiv.org/abs/..."],
  "agent_turns": 4,
  "error": null,
  "created_at": "2026-02-26T07:00:00Z",
  "updated_at": "2026-02-26T07:00:38Z"
}

GET /health

{ "status": "ok", "version": "1.0.0" }

Interactive docs available at /docs (Swagger UI auto-generated by FastAPI).


Shared State (ResearchState)

All agents read from and write back to a single TypedDict that flows through the graph:

class ResearchState(TypedDict):
    query: str                               # Original research query
    depth: str                               # "quick" | "standard" | "deep"
    messages: Annotated[list, add_messages]  # Full message history — add_messages REDUCER
    sub_questions: list[str]                 # Set by Planner
    research_findings: list[str]             # Accumulated by Researcher
    gaps_identified: list[str]               # Set by Critic
    research_iterations: int                 # Incremented by Researcher — loop guard
    final_report: str                        # Set by Writer
    sources: list[str]                       # Accumulated throughout
    next_agent: str                          # Set by Supervisor for routing

messages uses the add_messages reducer — every agent appends to the history rather than overwriting it. All other fields use default last-write-wins replacement.


Setup & Running Locally

Prerequisites

  • Python 3.11+
  • Docker Desktop (for containerized run)
  • API keys: Groq (free), Tavily (free), LangSmith (free, optional)

1. Clone and configure

git clone https://github.com/noviciusss/Argus.git
cd Argus
cp .env.example .env
# Edit .env and add your API keys

2. Run with Docker (recommended)

docker-compose up --build
  • Streamlit UI: http://localhost:8501
  • API docs: http://localhost:8000/docs
  • Health check: http://localhost:8000/health

3. Run without Docker

pip install -r requirements.txt

# Terminal 1 — API
python -m uvicorn src.api.main:app --reload --port 8000

# Terminal 2 — UI
python -m streamlit run src/ui/streamlit_app.py

Always run from the project root using python -m so src.* imports resolve correctly.


Rate Limiting

The public /research endpoint is rate-limited using slowapi to prevent free-tier API quota abuse. Since the Render URL is publicly accessible, an unprotected endpoint could be hammered by anyone, burning through Groq and Tavily free-tier credits.

Current limits:

  • POST /research5 requests per IP per hour
  • GET /jobs/* — unlimited (read-only, no API cost)
  • GET /health — unlimited (required for UptimeRobot pings)

If the limit is exceeded, the API returns:

HTTP 429 Too Many Requests
{ "error": "Rate limit exceeded: 5 per 1 hour" }

The limit is enforced per IP address. For local development and Docker, the limit is effectively not hit under normal use. To adjust limits, change the @limiter.limit("5/hour") decorator in src/api/routes/research.py.

Additional protection — hard caps on API dashboards:

  • Groq → Usage Limits → set monthly token cap
  • Tavily → Dashboard → set monthly search cap

Even if the rate limiter is bypassed, the upstream API caps act as a second defense layer.


Design Decisions & Trade-offs

Why multi-agent instead of one big ReAct agent?

A single ReAct agent conflates planning, researching, critiquing, and writing — each has different failure modes and requires different prompting strategies. With one agent:

  • Planning prompt interferes with tool-calling prompt
  • No clean separation of concerns for debugging
  • The critique loop is architecturally impossible — the agent can't objectively review its own just-completed output in the same turn

Separating into specialist agents allows independent prompts, independent error handling, and a dedicated Critic that reviews findings with fresh context before writing begins.

Why async jobs instead of a streaming/blocking response?

Research takes 30–90 seconds. Standard HTTP requests timeout at ~30 seconds in most clients, browsers, and load balancers. The async job pattern (submit → poll → fetch) decouples request handling from computation — this is the standard production pattern for any long-running AI task. It's the same pattern used by OpenAI's Batch API and Anthropic's async endpoints.

An alternative would be Server-Sent Events (SSE) for real-time streaming — listed as a future improvement.

Why SQLite instead of PostgreSQL?

This is a single-user, single-process deployment. SQLite handles this workload easily with zero infrastructure overhead — no separate database container, no connection pool, no migration tooling needed. The swap to PostgreSQL is explicitly one line in src/persistence/db.py (the connection string). The rest of the code is identical. This was a deliberate design choice to demonstrate production thinking on a dev setup.

Why Groq (Llama 3.3 70B) instead of GPT-4 or Claude?

Groq's free tier provides ~500 tokens/second — fast enough that agent turns feel snappy rather than laggy. For supervisor routing (which needs precise instruction-following), Llama 3.3 70B is sufficiently capable. For a demo project with real usage, paying $0 vs paying per token matters. The LLM is abstracted behind LangChain's ChatGroq interface — swapping to GPT-4o is one line change in each agent file.

Why FastAPI BackgroundTasks instead of Celery/Redis?

FastAPI's BackgroundTasks requires zero extra infrastructure — no Redis container, no worker process, no broker configuration. For single-user usage it works perfectly. The trade-off is that background tasks are in-process, so if the server restarts mid-research, the job is lost (status stays "running" forever in the DB). For a demo portfolio project this is acceptable. Celery + Redis is listed as the production upgrade path.

Why is the checkpointer using a raw SQLite connection instead of from_conn_string()?

SqliteSaver.from_conn_string() returns a context manager designed for with blocks — it closes the connection when exiting the context. Since the graph lives for the entire app lifetime (built once at module load), the connection must stay open. Passing a raw sqlite3.connect() connection directly to SqliteSaver(conn) keeps the connection open for the app's lifetime.

Why does depth="quick" skip ArXiv and Wikipedia?

ArXiv's rate-limit fix requires a 3-second sleep between paper fetches. For a "quick" research run, adding 6–9 seconds of sleep per iteration defeats the purpose. Quick mode uses Tavily web search only (3 results) — fast but sufficient for general queries. Standard and deep modes enable all three tools.

Render free tier cold starts

Render's free tier spins containers down after 15 minutes of inactivity. The first request after a cold start takes 30–60 seconds to respond — this is a Render free-tier limitation, not an application bug. Subsequent requests are fast. The fix is upgrading to a paid Render instance ($7/month) or using a cron job to ping /health every 14 minutes to keep the container warm.

What would you add with more time?
Improvement Why
Redis + Celery Proper async task queue — jobs survive server restarts
PostgreSQL Multi-user support, persistent jobs across deploys
Server-Sent Events (SSE) Real-time streaming of agent progress instead of polling
PDF export Download research reports as formatted PDFs
Rate limiting middleware Doneslowapi implemented, 5 req/hour per IP on POST /research
LLM-as-Judge evaluation Score report quality using DoCopilot eval pattern
Authentication API key auth for the REST API
Report caching Same query within 24h returns cached result, no API cost

Observability

Every LLM call, tool call, and agent turn is automatically traced in LangSmith — no code instrumentation needed, just env vars.

Set in .env:

LANGSMITH_API_KEY=your_key
LANGSMITH_PROJECT=deep-research-engine
LANGSMITH_TRACING_V2=true

After a research run, visit smith.langchain.comdeep-research-engine project to see:

  • Per-agent latency breakdown
  • Token counts per LLM call
  • Tool call inputs/outputs (Tavily queries, ArXiv results)
  • Full state at each node transition
  • Error traces with full context if any agent fails

Environment Variables

Variable Required Description
GROQ_API_KEY console.groq.com — free tier
TAVILY_API_KEY tavily.com — free tier
LANGSMITH_API_KEY Recommended smith.langchain.com — free tier
LANGSMITH_PROJECT Recommended Set to deep-research-engine
LANGSMITH_TRACING_V2 Recommended Set to true
API_BASE Docker only Auto-set to http://api:8000 in compose

Related Projects

Project Pattern What it proved
MultiTool_Research Single-agent ReAct Tool use, conversation memory
DoCopilot RAG + LLM-as-Judge Document QA, evaluation pipelines
Argus (this) Multi-agent Supervisor Orchestration, async APIs, production deployment

License

MIT

About

A production-grade multi-agent research pipeline that autonomously plans, researches, critiques, and synthesizes comprehensive cited reports from any research query.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors