ARCANE — Autonomous Research & Context Acquisition Network Engine

An OpenClaw-inspired autonomous agent that researches founders and CEOs using LLM-driven planning, web scraping, and iterative reflection. Zero cost — runs entirely on Groq's free tier + DuckDuckGo.

What It Does

Given a target name (e.g. "Sam Altman"), ARCANE autonomously:

Plans targeted search queries using an LLM
Searches DuckDuckGo and extracts snippet facts immediately
Scrapes the most relevant web pages
Extracts structured facts (categorized by type) via LLM
Reflects — decides whether to keep researching or stop
Compiles a comprehensive JSON + Markdown report

Results include: early life, career journey, companies founded, achievements, controversies, net worth, personal life, notable quotes, and vision/philosophy — with a confidence score and source list.

Project Structure

AI-Agent/
├── main.py               # CLI entry point
├── app.py                # Flask web server with real-time SSE streaming
├── agent.py              # ResearchAgent — core agentic loop
├── llm/
│   └── groq_client.py    # Groq API wrapper (plan, extract, reflect, compile)
├── tools/
│   ├── search.py         # DuckDuckGo search (text + news)
│   ├── scraper.py        # HTTP scraper with BeautifulSoup content extraction
│   └── memory.py         # AgentMemory — facts, URL dedup, session persistence
├── output/
│   └── formatter.py      # Saves reports as JSON and Markdown
├── templates/
│   └── index.html        # ARCANE web UI — dark terminal aesthetic, live log stream
├── sessions/             # Saved research sessions (gitignored)
├── output/               # Generated reports (gitignored)
├── requirements.txt
├── Procfile              # For deployment (e.g. Render/Railway)
└── .env                  # Put your GROQ_API_KEY here (gitignored)

Setup

1. Clone & create virtualenv

git clone https://github.com/Rohan29-De/AI-Agent
cd AI-Agent
python3 -m venv venv
source venv/bin/activate

2. Install dependencies

pip install -r requirements.txt

3. Get a free Groq API key

Sign up at console.groq.com — it's free.

4. Create `.env`

echo "GROQ_API_KEY=your_key_here" > .env

Usage

CLI

# Basic research
python main.py "Sam Altman"

# More iterations = deeper research
python main.py "Jensen Huang" --iterations 5

# Resume a previous session
python main.py "Elon Musk" --resume

# Custom output directory
python main.py "Sundar Pichai" --output-dir results/

CLI flags:

Flag	Default	Description
`--iterations` / `-i`	3	Number of research iterations
`--urls-per-iter` / `-u`	4	Max URLs to scrape per iteration
`--resume` / `-r`	off	Resume from saved session
`--output-dir` / `-o`	`output/`	Where to save reports
`--no-markdown`	off	Skip markdown report

Web UI (ARCANE)

python app.py

Open http://localhost:5000. Enter a name, choose iterations, click DEPLOY. The terminal panel streams the agent's log in real-time; the report appears on the right when complete. Supports JSON and Markdown download.

Output

Reports are saved to output/ as both JSON and Markdown:

output/
├── sam_altman_20260310_161052.json
└── sam_altman_20260310_161052.md

JSON report structure:

{
  "name": "Samuel H. Altman",
  "title": "CEO of OpenAI",
  "summary": "...",
  "sections": {
    "early_life_education": "...",
    "career_journey": "...",
    "entrepreneurial_ventures": "...",
    "key_achievements": ["..."],
    "leadership_style": "...",
    "notable_quotes": ["..."],
    "controversies_challenges": "...",
    "net_worth_financials": "...",
    "personal_life": "...",
    "vision_philosophy": "..."
  },
  "sources": ["https://..."],
  "confidence_score": 85,
  "research_gaps": ["..."],
  "metadata": {
    "target": "Sam Altman",
    "total_facts": 47,
    "iterations_run": 3,
    "urls_scraped": 8,
    "categories": {"career": 18, "personal": 6}
  }
}

Architecture

Agentic Loop (per iteration)

PLAN  →  SEARCH  →  SCRAPE  →  EXTRACT  →  REFLECT
  ↑                                             |
  └─────────── (loop until done) ───────────────┘
                                                |
                                           COMPILE REPORT

Key Design Decisions

Snippet-first fact gathering: search result snippets are stored as facts immediately, even if the page can't be scraped. This ensures data is gathered even when sites block scrapers.
LLM-structured JSON: all LLM calls request JSON output with a defined schema via response_format={"type":"json_object"}.
Fuzzy deduplication: AgentMemory uses word-overlap similarity (>80% threshold) to avoid storing near-identical facts.
Domain filtering: known paywalls/social media (WSJ, NYT, Twitter, etc.) are skipped. High-value domains (Wikipedia, Forbes, Crunchbase, TechCrunch) are prioritized.
Session persistence: each research session is saved to sessions/<target>.json and can be resumed with --resume.
SSE streaming (web UI): the Flask app runs the agent in a background thread and pushes structured events to the browser via Server-Sent Events.

LLM Functions (`llm/groq_client.py`)

Function	Purpose
`plan_searches(target, context, iteration)`	Generate 3–5 targeted search queries
`extract_facts(text, target, url)`	Extract & categorize facts from scraped content
`should_continue(summary, target, iteration, max)`	Decide whether to keep researching
`compile_report(target, all_facts)`	Synthesize final structured report

All functions use a shared chat() helper that injects a SYSTEM_PROMPT and supports response_format="json".

Known Issues & Planned Improvements

Current Limitations

DuckDuckGo regional/irrelevant results: Short names can match unrelated topics (e.g. "SAM" matches Meta's Segment Anything Model). Mitigated with region='wt-wt' and more specific query phrasing.
JS-rendered pages return 0 chars: Sites like MSN and Benzinga are React apps — requests + BeautifulSoup can't execute JavaScript. Fix: integrate Playwright.
Repeated URLs across iterations: When DuckDuckGo returns the same URLs every iteration, later iterations scrape nothing new. Fix: always include Wikipedia as a first-pass source, and inject direct URLs for well-known people.

Planned Next Steps

Add Playwright for JS-rendered page scraping
Always scrape Wikipedia directly as first source
Wrap target name in quotes in search queries (e.g. "Sam Altman" biography)
Swappable LLM backend (OpenAI, Ollama, Gemini)
PDF export using WeasyPrint
Batch mode: research multiple targets in one run
Deploy to Render/Railway (Procfile is already set up)

Dependencies

Package	Purpose
`groq`	LLM inference (llama-3.3-70b-versatile, free tier)
`duckduckgo-search`	Web search without API key
`requests` + `beautifulsoup4` + `lxml`	Web scraping
`rich`	Terminal UI (panels, tables, colored output)
`flask`	Web server for ARCANE UI
`python-dotenv`	Load `GROQ_API_KEY` from `.env`

Cost

Zero. Groq's free tier provides generous rate limits for llama-3.3-70b-versatile. DuckDuckGo requires no API key.

Inspired by OpenClaw autonomous agent architecture.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ARCANE — Autonomous Research & Context Acquisition Network Engine

What It Does

Project Structure

Setup

1. Clone & create virtualenv

2. Install dependencies

3. Get a free Groq API key

4. Create `.env`

Usage

CLI

Web UI (ARCANE)

Output

Architecture

Agentic Loop (per iteration)

Key Design Decisions

LLM Functions (`llm/groq_client.py`)

Known Issues & Planned Improvements

Current Limitations

Planned Next Steps

Dependencies

Cost

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
llm		llm
templates		templates
tools		tools
.gitignore		.gitignore
Procfile		Procfile
README.md		README.md
agent.py		agent.py
app.py		app.py
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

ARCANE — Autonomous Research & Context Acquisition Network Engine

What It Does

Project Structure

Setup

1. Clone & create virtualenv

2. Install dependencies

3. Get a free Groq API key

4. Create .env

Usage

CLI

Web UI (ARCANE)

Output

Architecture

Agentic Loop (per iteration)

Key Design Decisions

LLM Functions (llm/groq_client.py)

Known Issues & Planned Improvements

Current Limitations

Planned Next Steps

Dependencies

Cost

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

4. Create `.env`

LLM Functions (`llm/groq_client.py`)

Packages