Hack Monty — Autonomous Sandbox Security Assessment

An LLM-driven autonomous hacking loop targeting Pydantic's $10,000 Hack Monty bounty. Built on the autoresearch pattern, evolved through four versions to XBOW-style parallel swarms and finally to a tokenworm + MCP architecture.

Architecture (V4)

                    ┌──────────────────────────────────────┐
                    │   tokenworm (LLM agent harness)       │
                    │                                       │
                    │  Provider: ollama_native              │
                    │  Skills: 4 role-focused SKILL.md      │
                    │  Sandbox: bwrap + network             │
                    │                                       │
                    │         MCP Client (stdio)             │
                    └───────────────┬──────────────────────┘
                                    │
                    ┌───────────────▼──────────────────────┐
                    │    hackmonty_mcp_server.py            │
                    │    17 boundary tools                  │
                    │    (run, evaluate, bandit, state)     │
                    └───────────────┬──────────────────────┘
                                    │
                    ┌───────────────▼──────────────────────┐
                    │  hackmonty.com    Ollama Cloud       │
                    │  GitHub Issues    Filesystem (notes) │
                    └──────────────────────────────────────┘

Quick Start

Production (tokenworm Zig binary)

# Requires: ollama daemon running (ollama serve)
# Cloud models work transparently — no pull needed
export USER_SECRET=your-passphrase

./run.sh            # 500 iterations
./run.sh 20         # 20 iterations (test)
./run.sh -i         # interactive REPL

Python SDK showcase

# Install tokenworm Python SDK
cd /home/dipankar/Github/tokenworm/bindings/python && pip install -e .

# Run the demo (Uses InlineSkill, MCP, native Ollama Cloud API)
cd /home/dipankar/Github/hackmonty
uv run python run.py         # 500 iterations
uv run python run.py 20      # 20 iterations

MCP server (SSE remote mode)

uv run python -B hackmonty_mcp_server.py --sse --port 8765

Results

750+ exploit attempts across 4 orchestrator versions
0 sandbox escapes found
1 latent unsafe bug found: heap_read_boxed provenance mismatch
43 unsafe blocks audited, 9 GHSA advisories reviewed, 6 CPython divergences documented
Full report: REPORT.md | Bounty submission: SUBMISSION.md
MCP documentation: MCP.md

Files

File	Purpose
`run.sh`	Production launcher (tokenworm Zig binary)
`run.py`	Python SDK showcase (tokenworm Python SDK)
`hackmonty_mcp_server.py`	MCP server — 17 boundary tools
`orchestrator.py`	V3 async swarm (standalone, no MCP)
`tokenworm/config.json`	Self-contained config (provider, MCP, skills, hooks, sandbox)
`skills/`	4 SKILL.md files (orchestrator, bandit-master, analyst, coder)
`program.md`	Agent instructions + attack template documentation
`agent.py`	LLM driver (minimax + deepseek-v4-flash for V3)
`bandit.py`	UCB1 bandit selection
`evaluate.py`	0-5 scoring + context enrichment
`hackmonty_client.py`	hackmonty.com API client (sync + async)
`fuzz_snapshots.py`	Snapshot protocol fuzzer (44 tests)
`source_scanner.py`	Monty Rust source static analysis
`REPORT.md`	12-section paper-level security assessment
`SUBMISSION.md`	Bounty submission (unsafe provenance bug)
`MCP.md`	MCP server documentation

License

MIT — See LICENSE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hack Monty — Autonomous Sandbox Security Assessment

Architecture (V4)

Quick Start

Production (tokenworm Zig binary)

Python SDK showcase

MCP server (SSE remote mode)

Results

Files

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
notes		notes
skills		skills
tokenworm		tokenworm
.gitignore		.gitignore
LICENSE		LICENSE
MCP.md		MCP.md
README.md		README.md
REPORT.md		REPORT.md
SUBMISSION.md		SUBMISSION.md
agent.py		agent.py
bandit.py		bandit.py
evaluate.py		evaluate.py
exploit_campaign.py		exploit_campaign.py
fuzz_snapshots.py		fuzz_snapshots.py
hackmonty_client.py		hackmonty_client.py
hackmonty_mcp_server.py		hackmonty_mcp_server.py
issue_tracker.py		issue_tracker.py
orchestrator.py		orchestrator.py
program.md		program.md
pyproject.toml		pyproject.toml
run.py		run.py
run.sh		run.sh
source_scanner.py		source_scanner.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Hack Monty — Autonomous Sandbox Security Assessment

Architecture (V4)

Quick Start

Production (tokenworm Zig binary)

Python SDK showcase

MCP server (SSE remote mode)

Results

Files

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages