A curated GitHub map of projects for building, operating, and governing agent harnesses, with skills preserved and integrated as a first-class harness layer.
An agent harness is the control plane around an agent:
- prompt and context scaffolding
- planning, state, checkpoints, and recovery
- tool and protocol integration
- isolated execution environments
- verification, reviewability, and guardrails
- reusable skills and behavior packs
This repository started life as a Claude Skills catalog. That history remains intact here, but it is now organized as part of a single Awesome Agent Harness map: skills are treated as a core harness layer rather than a detached appendix.
- Scope
- Search Method
- Selection Principles
- If You Only Read Five Repos
- Harness Layer Map
- 1. Harness-First Runtimes and Coding Agents
- 2. Frameworks, Planning, Orchestration, and Agent Protocols
- 3. Skills and Reusable Behavior Packs
- 4. MCP and Capability Fabric
- 5. Memory, State, and Context Systems
- 6. Browser, Sandbox, Execution, and Operator Surfaces
- 7. Observability, Evals, and Guardrails
- 8. Reference Harness Compositions
- Related Research in This Repo
- Contributing
This list focuses on high-signal, open-source GitHub projects that are directly useful when building an agent harness.
It does not attempt a mathematically exhaustive list of every repo in the ecosystem. That is no longer practical, especially for:
- MCP servers
- individual skill packs
- one-off app wrappers around existing agents
For long-tail discovery, this README points to registries and official indexes.
GitHub-wide search terms used in this sweep included:
agent harness
harness engineering
coding agent
agent runtime
stateful agent
model context protocol
mcp server
agent skills
browser agent
agent eval
agent guardrails
secure agent sandbox
Notes:
- This sweep was updated on March 22, 2026.
- Star counts in the
~100-starlayer notes below are GitHub API snapshots from March 22, 2026 and will change over time. - For this README,
~100 starsmeans roughly 80-160 stars, because exact 100-star filtering is too unstable to be useful in practice. - As of March 22, 2026,
OpenDevin/OpenDevinhas effectively moved to All-Hands-AI/OpenHands. - As of March 22, 2026,
openclaw/openclawresolves to clawdbot/clawdbot.
Projects are prioritized when they do one or more of the following:
- implement a full harness runtime
- expose a critical harness subsystem such as MCP, memory, sandboxing, evals, or skills
- act as an official spec, SDK, or registry
- provide patterns that are widely reusable across agent stacks
Projects are deprioritized when they are:
- thin wrappers around proprietary services with little reusable harness value
- abandoned experiments with no clear ecosystem relevance
- one-off MCP servers that are better represented by a registry
- openai/codex - terminal-native coding agent shell
- langchain-ai/deepagents - explicit open-source harness with planning, filesystem, subagents, MCP, and HITL
- modelcontextprotocol/modelcontextprotocol - the protocol layer that now defines most tool integration work
- github/github-mcp-server - the most important MCP server for coding workflows
- promptfoo/promptfoo - simple, practical evals and red-teaming for agent outputs
This README is organized by harness layers rather than by vendor brand or historical repository state.
The main layers in this map are:
- Harness-First Runtimes and Coding Agents: agent shells, coding agents, and repo-native execution loops
- Frameworks, Planning, Orchestration, and Agent Protocols: orchestration frameworks, workflow shapers, and agent interoperability layers
- Skills and Reusable Behavior Packs: portable instructions, workflow packs, community skills, creation guides, and skill operations
- MCP and Capability Fabric: tool protocols, MCP specs, SDKs, registries, and capability management
- Memory, State, and Context Systems: persistent memory, state graphs, and long-running context layers
- Browser, Sandbox, Execution, and Operator Surfaces: browser control, isolated execution, safe sandboxes, and operator-facing shells
- Observability, Evals, and Guardrails: tracing, evals, reviewability, policy, and safety tooling
- Reference Harness Compositions: cross-layer build stacks that combine the layers above into practical systems
The former standalone ~100-star section has been redistributed into the relevant layers below as Emerging Repos in This Layer, so the whole document stays centered on Awesome Agent Harness.
| Project | Why it matters for agent harnesses |
|---|---|
| openai/codex | Lightweight coding agent for the terminal; a strong reference for CLI-native harness design and repo-driven workflows. |
| langchain-ai/deepagents | One of the clearest open-source agent harness repos: planning, filesystem, shell, subagents, summarization, MCP, and HITL. |
| anomalyco/opencode | Open-source coding agent with built-in build and plan modes, provider-agnostic design, and strong TUI ergonomics. |
| All-Hands-AI/OpenHands | End-to-end software engineering agent platform; useful as a larger harness/control-plane reference. |
| Aider-AI/aider | Terminal pair-programming agent; a mature reference for repo-aware edit/apply/test loops. |
| continuedev/continue | Open-source CLI and IDE agent system with TUI and headless modes for background workflows. |
| cline/cline | IDE-native autonomous coding agent with explicit human approval, browser use, checkpoints, and MCP extension points. |
| block/goose | Extensible local agent that can install, execute, edit, and test with any LLM; good for MCP-heavy local harness patterns. |
| SWE-agent/SWE-agent | Research-heavy software engineering agent that stays useful as a harness reference for tool bundles, configs, and benchmarkable runs. |
| SWE-agent/mini-swe-agent | Minimal baseline harness showing how far a simple bash-first loop can go without a giant scaffold. |
| clawdbot/clawdbot | OpenClaw's current repo; useful if you care about always-on personal agent control planes, skills, channels, and device actions. |
| Project | Why it matters for agent harnesses |
|---|---|
| langchain-ai/langgraph | Low-level orchestration framework for long-running, stateful, controllable agents. |
| openai/openai-agents-python | Lightweight multi-agent workflow SDK with handoffs, sessions, tracing, and guardrails. |
| microsoft/autogen | Mature multi-agent framework with event-driven runtime and a large extension ecosystem. |
| crewAIInc/crewAI | Lean role-based multi-agent orchestration framework with a big ecosystem and many examples. |
| agno-agi/agno | Full-stack agent system with runtime, control plane, memory, knowledge, MCP, A2A, and eval hooks. |
| pydantic/pydantic-ai | Strong choice for typed, production-grade agent workflows with durable execution and approval hooks. |
| letta-ai/letta | Stateful agent platform focused on advanced memory and persistent agent identity over time. |
| lastmile-ai/mcp-agent | MCP-native framework that combines simple workflow patterns with durable execution. |
| a2aproject/A2A | Open Agent2Agent protocol for agent interoperability beyond tool calling. |
| anthropics/claude-agent-sdk-python | SDK for embedding Claude Agent / Claude Code style behavior into programmable workflows. |
These repos were previously listed under Emerging ~100-Star Harness Projects. They are now placed here because they shape planning, orchestration, repo instructions, or runtime control.
| Project | Stars | Layer | Why it matters |
|---|---|---|---|
| trevor-nichols/agentrules-architect | 109 | Prompt scaffold / planning | Generates AGENTS.md / CLAUDE.md style rule files plus ExecPlan-oriented harness structure. |
| Codename-Inc/spectre | 116 | Workflow scaffold | Encodes /Scope -> /Plan -> /Execute -> /Clean -> /Test -> /Evaluate as a reusable coding workflow harness. |
| sudocode-ai/sudocode | 248 | Repo workflow layer | Slightly above the target range, but useful as a repo-local orchestration layer that lives with the codebase itself. |
| SethGammon/Citadel | 125 | Orchestration runtime | Claude Code team harness with routing, worktrees, lifecycle hooks, circuit breakers, and campaign persistence. |
| go-a2a/adk-go | 99 | Agent runtime / deployment | Go toolkit for building, evaluating, and deploying controlled agent systems. |
| Mercor-Intelligence/archipelago | 134 | Execution harness / eval | Harness for running and evaluating AI agents against RL environments. |
| jpicklyk/task-orchestrator | 170 | Task orchestration | Slightly above the target range, but notable for persistent work tracking and context storage across sessions and agents. |
Skills are a first-class harness layer. They package reusable instructions, scripts, templates, and workflows that sit between raw prompts and external tools. In a real harness, skills help turn repeated behavior into portable, versioned, reviewable assets.
This section intentionally preserves the repository's historical Claude Skills catalog, but it now lives inside the Awesome Agent Harness map as the reusable behavior layer rather than a detached second theme.
| Project | Why it matters for agent harnesses |
|---|---|
| anthropics/skills | Official public repo for Claude skills, skill examples, templates, and the skill spec. |
| openai/skills | Codex-focused skills catalog; useful for understanding the cross-agent skills pattern. |
| vercel-labs/skills | Cross-agent skills CLI that installs skills into Codex, Claude Code, Cursor, OpenCode, OpenClaw, and more. |
| obra/superpowers | A full workflow system built from composable skills, plans, subagents, worktrees, and review loops. |
| trailofbits/skills | Security-heavy skill marketplace for audits, CodeQL, Semgrep, diff review, and secure dev workflows. |
| expo/skills | Official Expo team skill pack for building, deploying, and debugging Expo apps. |
Skills employ a progressive disclosure architecture for efficiency:
- Metadata loading (~100 tokens): Claude scans available Skills to identify relevant matches
- Full instructions (<5k tokens): Load when Claude determines the Skill applies
- Bundled resources: Files and executable code load only as needed
This design allows multiple Skills to remain available without overwhelming Claude's context window.
- Go to Settings > Capabilities
- Enable Skills toggle
- Browse available skills or upload custom skills
- For Team/Enterprise: Admin must enable Skills organization-wide first
# Install skills from marketplace
/plugin marketplace add anthropics/skills
# Or install from local directory
/plugin add /path/to/skill-directorySkills are accessible via the /v1/skills API endpoint. See the Skills API documentation for detailed integration examples.
import anthropic
client = anthropic.Client(api_key="your-api-key")
# See API docs for full implementation detailsSkills for working with complex file formats:
- docx - Create, edit, and analyze Word documents with support for tracked changes, comments, formatting preservation, and text extraction
- pdf - Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms
- pptx - Create, edit, and analyze PowerPoint presentations with support for layouts, templates, charts, and automated slide generation
- xlsx - Create, edit, and analyze Excel spreadsheets with support for formulas, formatting, data analysis, and visualization
- algorithmic-art - Create generative art using p5.js with seeded randomness, flow fields, and particle systems
- canvas-design - Design beautiful visual art in .png and .pdf formats using design philosophies
- slack-gif-creator - Create animated GIFs optimized for Slack's size constraints
- frontend-design - Instructs Claude to avoid "AI slop" or generic aesthetics and to make bold design decisions. Works very well for React & Tailwind.
- web-artifacts-builder - Build complex claude.ai HTML artifacts using React, Tailwind CSS, and shadcn/ui components
- mcp-builder - Guide for creating high-quality MCP servers to integrate external APIs and services
- webapp-testing - Test local web applications using Playwright for UI verification and debugging
- oh-my-claudecode - Multi-agent orchestration for Claude Code. Zero learning curve.
- oh-my-codex - Start Codex stronger, then let OMX add better prompts, workflows, and runtime help when the work grows.
- oh-my-openagent - for opencode
- planning-with-files - Claude Code skill implementing Manus-style persistent markdown planning -- the workflow pattern behind the $2B acquisition.
- ui-ux-pro-max-skill - An AI SKILL that provide design intelligence for building professional UI/UX multiple platforms
- Code Review Plugin - Automated code review for pull requests using multiple specialized agents with confidence-based scoring to filter false positives.
- code-simplifier -
- ralph-loop -
- mcp-builder -
- skillsmp -
- brand-guidelines - Apply Anthropic's official brand colors and typography to artifacts
- internal-comms - Write internal communications like status reports, newsletters, and FAQs
- skill-creator - Interactive skill creation tool that guides you through building new skills with Q&A
Warning
Skills can execute arbitrary code in Claude's environment.
See Skills Security, Operations, and FAQ for more information.
- obra/superpowers - Core skills library for Claude Code with 20+ battle-tested skills including TDD, debugging, and collaboration patterns
- Features
/brainstorm,/write-plan,/execute-plancommands and skills-search tool - superpowers-skills - Community-editable skills repository
- Blog: Superpowers - Author's overview by Jesse Vincent
- Installation:
/plugin marketplace add obra/superpowers-marketplace
- Features
- obra/superpowers-lab - Experimental skills for
Claude Code Superpowers(see above)- Uses new techniques that are still being refined and tested (i.e. skills here may change over time)
- Blog post about its development
- Install from
superpowers-marketplaceplugin
These will be broken down into categories once there are enough community skills available to list.
| Skill | Description |
|---|---|
| ios-simulator-skill | iOS app building, navigation, and testing through automation |
| ffuf-web-fuzzing | Expert guidance for ffuf web fuzzing during penetration testing, including authenticated fuzzing with raw requests, auto-calibration, and result analysis |
| playwright-skill | General-purpose browser automation using Playwright |
| claude-d3js-skill | Visualizations in d3.js |
| claude-scientific-skills | Comprehensive collection of ready-to-use scientific skills, including working with specialized scientific libraries and databases |
| web-asset-generator | Generates web assets like favicons, app icons, and social media images |
| loki-mode | Multi-agent autonomous startup system - orchestrates 37 AI agents across 6 swarms to build, deploy, and operate a complete startup from PRD to revenue |
| Trail of Bits Security Skills | Security skills for static analysis with CodeQL/Semgrep, variant analysis, code auditing, and vulnerability detection |
| frontend-slides | Create animation-rich HTML presentations -- from scratch or by converting PowerPoint files |
| Expo Skills | Official skills by the Expo team for developing Expo apps |
| shadcn/ui | Give Claude Code context on shadcn components as well as pattern enforcement |
More community skills coming soon! Submit a PR to add your skill.
- yusufkaraaslan/Skill_Seekers - Convert documentation websites into Claude Skills
- claude-hud - A Claude Code plugin that shows what's happening -- context usage, active tools, running agents, and todo progress. Always visible below your input.
- Claude-to-IM - Bridge Claude Code / Codex to IM platforms -- chat with AI coding agents from Telegram, Discord, Feishu/Lark, or QQ.
- CodePilot - A desktop GUI for Claude Code -- chat, code, and manage projects visually. Built with Electron + Next.js.
Step-by-Step Guide
The easiest way to create a skill is to use the built-in skill-creator:
- Enable the skill-creator skill in Claude
- Ask Claude: "Use the skill-creator to help me build a skill for [your task]"
- Answer the interactive questions about your workflow
- Claude generates the complete skill structure for you
-
Create folder structure:
my-skill/ ├── SKILL.md # Main skill file with frontmatter ├── scripts/ # Optional executable scripts │ └── helper.py └── resources/ # Optional supporting files └── template.json -
Create SKILL.md with frontmatter:
--- name: my-skill description: Brief description for skill discovery (keep concise) --- # Detailed Instructions Claude will read these instructions when the skill is activated. ## Usage Explain how to use this skill... ## Examples Provide clear examples...
-
Add executable scripts (optional):
- Python, JavaScript, or other scripts Claude can execute
- Reference them in your SKILL.md instructions
-
Test locally:
- Install the skill in Claude Code or Claude Desktop
- Test with relevant tasks
- Iterate and refine
-
Share:
- Publish to GitHub
- Submit to this awesome list via PR
- Share with your team via git repos or internal distribution
- Keep descriptions concise - The frontmatter description is used for skill discovery
- Use clear, actionable instructions - Write instructions as if for a human collaborator
- Include examples - Show specific examples in your SKILL.md
- Version your skills - Use git tags for version management
- Document dependencies - List any prerequisites or required packages
- Test thoroughly - Verify your skill works across different scenarios
- What are Skills? - Official support article explaining Claude Skills
- Using Skills in Claude - How to enable and use skills
- Claude Skills Announcement - Official announcement from Anthropic
- Equipping Agents with Skills - Engineering deep dive on Agent Skills
- Claude Developer Platform - Official documentation
- Skills API Endpoint -
/v1/skillsAPI documentation
- anthropics/skills - Official public repository for Skills
- Claude Cookbooks - Skills - Example notebooks and tutorials
- How to Create Your First Claude Skill - Step-by-step tutorial with examples
- How to Use Skills in Claude Code - Installation, project scoping, and testing guide
Video tutorials coming soon! Have a good video about Claude Skills? Submit a PR!
Example topics we'd love to see
- Getting started with Claude Skills
- Building your first custom skill
- Skills vs MCP comparison
- Enterprise deployment strategies
- Skills Explained - Official Anthropic blog post covering progressive disclosure, use cases, and when to use Skills vs other tools
- Simon Willison: Claude Skills are awesome, maybe a bigger deal than MCP - Technical deep dive and analysis
- Nov 13: Anthropic publishes Skills Explained - Comprehensive guide covering progressive disclosure architecture, decision matrices for Skills vs Prompts/Subagents/Projects, and best practices
- Oct 18: Major community repositories emerge: obra/superpowers skills library
- Oct 17: Community publishes practical tutorials on DEV.to and Medium
- Oct 16: Claude Skills officially announced - Available across Claude.ai, Code, and API
- Oct 16: Initial skills released including docx, pdf, pptx, xlsx, algorithmic-art, canvas-design, and more
| Tool | Best For |
|---|---|
| Skills | Reusable procedural knowledge across conversations |
| Prompts | One-time instructions and immediate context |
| Projects | Persistent background knowledge within workspaces |
| Subagents | Independent task execution with specific permissions |
| MCP | Connecting Claude to external data sources |
Use Skills when: Capabilities should be accessible to any Claude instance. They're portable expertise.
Use Subagents when: You need self-contained agents designed for specific purposes with independent workflows and restricted tool access.
Combined approach: Subagents can leverage Skills for specialized expertise, merging independence with portable knowledge.
Key insight: If you find yourself typing the same prompt repeatedly across multiple conversations, it's time to create a Skill.
| Feature | Skills | MCP |
|---|---|---|
| Purpose | Task-specific expertise and workflows | External data/API integration |
| Portability | Same format everywhere (Claude.ai, Code, API) | Requires server configuration |
| Code Execution | Can include executable scripts | Provides tools/resources |
| Token Efficiency | 30-50 tokens until loaded | Varies by implementation |
| Best For | Repeatable tasks, document workflows | Database access, API integrations |
Use Together: Skills can create MCP servers! The mcp-builder skill helps build high-quality MCP integrations.
| Feature | Skills | System Prompts |
|---|---|---|
| Structure | Folder with YAML frontmatter, instructions, scripts | Plain text instructions |
| Reusability | Version-controlled, shareable, composable | Copy-paste, conversation-specific |
| Loading | On-demand (only when relevant) | Always in context |
| Maintenance | Centralized updates | Manual updates per conversation |
| Composability | Multiple skills stack automatically | Manual combination |
Security Guidelines & Best Practices
- Only install skills from trusted sources
- Review SKILL.md and all scripts before enabling a skill
- Be cautious of skills that request sensitive data access
- Audit carefully before deploying to production or enterprise environments
- Malicious skills may introduce vulnerabilities or enable data exfiltration
- Prompt injection attacks could be amplified through compromised skills
- Sandboxing limitations - Understand the security model before enterprise deployment
- Security research: Weaponizing Claude Code Skills - Analysis of potential security risks
- Version control - Track all skills in git with proper version tags
- Code review - Peer review custom skills before team distribution
- Least privilege - Only grant necessary permissions and access
- Regular audits - Periodically review installed skills
- Documentation - Maintain clear documentation for custom skills
- Testing - Thoroughly test skills in non-production environments first
- As of October 2025, Claude.ai does not support centralized admin management for custom skills
- Use version control and internal repositories for team skill distribution
- Establish clear policies for skill vetting and approval
- Monitor skill usage and performance
Known Issues & Common Problems
- Linux path bug (Oct 18, 2025): Agent SDK uses hardcoded macOS paths instead of environment home directory
- Issue #268
- Workaround: Manually specify skill paths
- Enterprise distribution: No centralized admin management yet for custom skills on claude.ai
- Use git repositories for team distribution
- API integration provides more control
Skills not appearing in Claude
- Check Settings > Capabilities to ensure Skills are enabled
- For Team/Enterprise: Verify admin has enabled Skills organization-wide
- Restart Claude after installing new skills
Skills not loading/activating
- Verify SKILL.md has proper YAML frontmatter format
- Check that
nameanddescriptionfields are present - Ensure file structure matches expected format
Permission errors
- Review admin settings for Team/Enterprise accounts
- Check file permissions in skill directories
- Verify API key has appropriate permissions
Skill execution failures
- Check script dependencies are installed
- Review error logs for specific issues
- Test scripts independently outside of Claude
Common Questions
Q: How much do skills impact token usage?
A: Skills are highly efficient thanks to progressive disclosure. Each skill uses only ~100 tokens during metadata scanning to determine relevance. When activated, the full skill content loads at <5k tokens. Bundled resources only load as needed.
Q: What's the difference between Claude Skills and Agent Skills?
A: They are the same thing.
Q: Can I share skills with my team?
A: Yes! Skills can be shared via:
- Git repositories (recommended)
- Internal file sharing
- Claude API for programmatic distribution
- Enterprise-wide deployment features (coming soon)
Q: Do skills work with all Claude models?
A: Skills are available for Pro, Max, Team, and Enterprise users. Free tier users do not have access to Skills.
Q: Can skills call external APIs?
A: Yes, skills can include scripts that call external APIs. For complex API integrations, consider using MCP (Model Context Protocol) alongside skills.
Q: How does Claude decide which skill to use?
A: Claude scans all available skills' frontmatter (name and description), evaluates relevance to the current task, then loads the full content of relevant skills. Multiple skills can be loaded and composed together automatically.
Q: Can I use Skills and MCP together?
A: Absolutely! They complement each other. Use Skills for task-specific workflows and MCP for external data/API integration. The mcp-builder skill can even help you build MCP servers.
Q: Are there any costs beyond my Claude subscription?
A: No additional costs for using official skills. Community and custom skills are free to use, though some may require external services (APIs, databases, etc.) that have their own costs.
Q: Can I monetize custom skills?
A: Currently, there is no official marketplace for paid skills. Anthropic has mentioned plans for community contributions and a potential marketplace in the future.
Q: How do I update a skill?
A: For skills from git repositories, pull the latest changes. For manually installed skills, replace the skill folder with the updated version. Always test updates in a non-production environment first.
| Project | Why it matters for agent harnesses |
|---|---|
| modelcontextprotocol/modelcontextprotocol | The MCP spec and docs repo; the foundation for modern tool, resource, and prompt integration. |
| modelcontextprotocol/servers | Official reference servers plus the gateway into the wider MCP registry and ecosystem. |
| modelcontextprotocol/typescript-sdk | Official TypeScript SDK for writing MCP clients and servers. |
| modelcontextprotocol/python-sdk | Official Python SDK for MCP clients and servers. |
| github/github-mcp-server | The most important coding-focused MCP server: repos, files, issues, PRs, Actions, security, and more. |
| microsoft/playwright-mcp | Browser automation via MCP using accessibility snapshots instead of pixel-only interaction. |
| punkpeye/awesome-mcp-servers | Broad community index for MCP servers across every domain. |
| CodeAlive-AI/codealive-mcp | A strong example of an MCP-first context engine for large codebases. |
| Project | Stars | Layer | Why it matters |
|---|---|---|---|
| Flux159/mcp-chat | 134 | MCP client / testing | Useful for testing and evaluating MCP servers and agent setups from the client side. |
| cs50victor/claude-code-teams-mcp | 219 | MCP orchestration | Above the target range, but a good example of using MCP to expose team orchestration patterns to harnesses. |
| amxv/mcp-manager | 285 | MCP management UI | Above the target range, but increasingly relevant as harnesses need GUI-level MCP fleet management. |
| Project | Why it matters for agent harnesses |
|---|---|
| letta-ai/letta | Best viewed as a memory-first platform for persistent, stateful agents. |
| mem0ai/mem0 | Universal memory layer for user, session, and agent state. |
| getzep/graphiti | Real-time knowledge graphs for agent memory, retrieval, and historical reasoning; also includes an MCP server. |
| langchain-ai/langgraph | Important here for checkpointing, resumability, and explicit state graphs. |
| CodeAlive-AI/codealive-mcp | Worth revisiting here as a context-engine layer rather than just an MCP endpoint. |
| Project | Stars | Layer | Why it matters |
|---|---|---|---|
| srikanthbellary/openstinger | 114 | Memory harness | Explicitly positions itself as a portable memory harness for agents. |
| aayoawoyemi/Ori-Mnemos | 139 | Persistent memory | Local-first persistent memory system built specifically for agentic workflows. |
| AGI-is-going-to-arrive/Memory-Palace | 211 | Long-term memory OS | Above the target range, but highly relevant as a memory operating system concept for AI agents. |
| Project | Why it matters for agent harnesses |
|---|---|
| browser-use/browser-use | Makes websites accessible to agents; useful for end-to-end verification and browser action loops. |
| e2b-dev/E2B | Secure isolated cloud sandboxes for running AI-generated code. |
| SWE-agent/SWE-ReX | Runtime interface for sandboxed shell execution, local or remote, with strong parallelization support. |
| microsoft/playwright-mcp | Also belongs here as the cleanest browser substrate for MCP-native agents. |
| clawdbot/clawdbot | Interesting if your harness needs channels, device actions, voice, and always-on local control. |
| Project | Stars | Layer | Why it matters |
|---|---|---|---|
| llm-platform-security/SecGPT | 109 | Isolation architecture | Focuses on execution isolation for LLM-based agent systems. |
| Cloudgeni-ai/infrastructure-agents-guide | 125 | Safe operations | Guide repo centered on architecture, sandboxing, credentials, change control, and observability for infra agents. |
| PACHAKUTlQ/ClaudeCage | 138 | Sandboxed runtime | Portable sandbox wrapper for Claude Code style workflows. |
| mattolson/agent-sandbox | 157 | Local sandbox | Local secure dev environment for agent collaboration. |
| OpenSource03/harnss | 147 | Desktop harness shell | Desktop UI for Claude Code, Codex, and ACP-compatible agents with terminal, browser, Git, and MCP visualization. |
| EDEAI/OpenFlux | 173 | Desktop agent client | Slightly above the target range, but notable for long-term memory, browser automation, and tool orchestration in a local client. |
| Project | Why it matters for agent harnesses |
|---|---|
| langfuse/langfuse | Full LLM engineering platform for tracing, evals, prompts, datasets, and production debugging. |
| Arize-ai/phoenix | Open-source observability and evaluation platform for tracing and troubleshooting agent runs. |
| promptfoo/promptfoo | Practical eval, CI, red-teaming, and vulnerability scanning for prompts, RAG, and agents. |
| truera/trulens | Evaluation and tracking framework for LLM applications and agents. |
| invariantlabs-ai/invariant | Rule-based guardrails layer that can sit between your app and MCP or LLM providers. |
| invariantlabs-ai/mcp-scan | MCP-specific security scanner and proxy for inspection, logging, and runtime enforcement. |
| Project | Stars | Layer | Why it matters |
|---|---|---|---|
| METR/vivaria | 135 | Evaluation harness | METR's evaluation and elicitation research tooling; highly relevant for serious agent evaluation. |
| scabench-org/scabench | 105 | Audit-agent eval | Framework for evaluating AI audit agents on recent real-world data. |
| philschmid/ai-agent-benchmark-compendium | 112 | Benchmark index | Curated benchmark map for agent evaluation across coding, tool use, reasoning, and computer interaction. |
| arklexai/arksim | 112 | Error simulation / eval | Helps surface agent failures before they hit real users. |
| Mengmeara/agent-safe-probe-x | 83 | Safety evaluation | Focused framework for automated safety evaluation of intelligent agents. |
These are the former Suggested Build Stacks, reframed as cross-layer harness compositions. Each composition combines multiple layers from the map above rather than acting like an isolated appendix.
Best when you want the smallest useful end-to-end coding loop.
- Runtime layer: openai/codex or anomalyco/opencode
- Capability layer: github/github-mcp-server
- Browser / execution layer: microsoft/playwright-mcp
- Eval layer: promptfoo/promptfoo
- Observability layer: langfuse/langfuse
Best when MCP is your primary capability fabric and you want the rest of the harness to follow that design.
- Runtime / framework layer: langchain-ai/deepagents or lastmile-ai/mcp-agent
- Protocol layer: modelcontextprotocol/modelcontextprotocol
- Registry layer: modelcontextprotocol/servers
- Memory layer: mem0ai/mem0 or getzep/graphiti
- Observability / eval layer: Arize-ai/phoenix
Best when approval, isolation, scanning, and secure workflows matter more than raw autonomy.
- Runtime layer: cline/cline or block/goose
- Skills / behavior layer: trailofbits/skills
- Guardrails layer: invariantlabs-ai/invariant
- MCP inspection layer: invariantlabs-ai/mcp-scan
- Eval layer: promptfoo/promptfoo
- research/agent-harness-architecture-2026.md - architecture blueprint and implementation guidance
- research/model-is-not-key-harness-is.md - Chinese synthesis of the 2026 harness engineering shift
- research/deep-research-report.md - supporting research notes
PRs are welcome. Please prefer:
- official upstream repos over mirrors
- registries over long-tail one-off entries when a category is exploding
- concise descriptions that explain why a repo is harness-related
- exact links to GitHub repos, not marketing pages
If you are submitting a Skill, keep the skill catalog conventions intact and place the new entry where it best fits within the Skills and Reusable Behavior Packs layer.
See CONTRIBUTING.md for submission guidance.