Awesome Agent Harness

A curated GitHub map of projects for building, operating, and governing agent harnesses, with skills preserved and integrated as a first-class harness layer.

An agent harness is the control plane around an agent:

prompt and context scaffolding
planning, state, checkpoints, and recovery
tool and protocol integration
isolated execution environments
verification, reviewability, and guardrails
reusable skills and behavior packs

This repository started life as a Claude Skills catalog. That history remains intact here, but it is now organized as part of a single Awesome Agent Harness map: skills are treated as a core harness layer rather than a detached appendix.

Quick Navigation

Scope
Search Method
Selection Principles
If You Only Read Five Repos
Harness Layer Map
1. Harness-First Runtimes and Coding Agents
2. Frameworks, Planning, Orchestration, and Agent Protocols
3. Skills and Reusable Behavior Packs
4. MCP and Capability Fabric
5. Memory, State, and Context Systems
6. Browser, Sandbox, Execution, and Operator Surfaces
7. Observability, Evals, and Guardrails
8. Reference Harness Compositions
Related Research in This Repo
Contributing

Scope

This list focuses on high-signal, open-source GitHub projects that are directly useful when building an agent harness.

It does not attempt a mathematically exhaustive list of every repo in the ecosystem. That is no longer practical, especially for:

MCP servers
individual skill packs
one-off app wrappers around existing agents

For long-tail discovery, this README points to registries and official indexes.

Search Method

GitHub-wide search terms used in this sweep included:

agent harness
harness engineering
coding agent
agent runtime
stateful agent
model context protocol
mcp server
agent skills
browser agent
agent eval
agent guardrails
secure agent sandbox

Notes:

This sweep was updated on March 22, 2026.
Star counts in the ~100-star layer notes below are GitHub API snapshots from March 22, 2026 and will change over time.
For this README, ~100 stars means roughly 80-160 stars, because exact 100-star filtering is too unstable to be useful in practice.
As of March 22, 2026, OpenDevin/OpenDevin has effectively moved to All-Hands-AI/OpenHands.
As of March 22, 2026, openclaw/openclaw resolves to clawdbot/clawdbot.

Selection Principles

Projects are prioritized when they do one or more of the following:

implement a full harness runtime
expose a critical harness subsystem such as MCP, memory, sandboxing, evals, or skills
act as an official spec, SDK, or registry
provide patterns that are widely reusable across agent stacks

Projects are deprioritized when they are:

thin wrappers around proprietary services with little reusable harness value
abandoned experiments with no clear ecosystem relevance
one-off MCP servers that are better represented by a registry

If You Only Read Five Repos

openai/codex - terminal-native coding agent shell
langchain-ai/deepagents - explicit open-source harness with planning, filesystem, subagents, MCP, and HITL
modelcontextprotocol/modelcontextprotocol - the protocol layer that now defines most tool integration work
github/github-mcp-server - the most important MCP server for coding workflows
promptfoo/promptfoo - simple, practical evals and red-teaming for agent outputs

Harness Layer Map

This README is organized by harness layers rather than by vendor brand or historical repository state.

The main layers in this map are:

Harness-First Runtimes and Coding Agents: agent shells, coding agents, and repo-native execution loops
Frameworks, Planning, Orchestration, and Agent Protocols: orchestration frameworks, workflow shapers, and agent interoperability layers
Skills and Reusable Behavior Packs: portable instructions, workflow packs, community skills, creation guides, and skill operations
MCP and Capability Fabric: tool protocols, MCP specs, SDKs, registries, and capability management
Memory, State, and Context Systems: persistent memory, state graphs, and long-running context layers
Browser, Sandbox, Execution, and Operator Surfaces: browser control, isolated execution, safe sandboxes, and operator-facing shells
Observability, Evals, and Guardrails: tracing, evals, reviewability, policy, and safety tooling
Reference Harness Compositions: cross-layer build stacks that combine the layers above into practical systems

The former standalone ~100-star section has been redistributed into the relevant layers below as Emerging Repos in This Layer, so the whole document stays centered on Awesome Agent Harness.

1. Harness-First Runtimes and Coding Agents

Project	Why it matters for agent harnesses
openai/codex	Lightweight coding agent for the terminal; a strong reference for CLI-native harness design and repo-driven workflows.
langchain-ai/deepagents	One of the clearest open-source `agent harness` repos: planning, filesystem, shell, subagents, summarization, MCP, and HITL.
anomalyco/opencode	Open-source coding agent with built-in `build` and `plan` modes, provider-agnostic design, and strong TUI ergonomics.
All-Hands-AI/OpenHands	End-to-end software engineering agent platform; useful as a larger harness/control-plane reference.
Aider-AI/aider	Terminal pair-programming agent; a mature reference for repo-aware edit/apply/test loops.
continuedev/continue	Open-source CLI and IDE agent system with TUI and headless modes for background workflows.
cline/cline	IDE-native autonomous coding agent with explicit human approval, browser use, checkpoints, and MCP extension points.
block/goose	Extensible local agent that can install, execute, edit, and test with any LLM; good for MCP-heavy local harness patterns.
SWE-agent/SWE-agent	Research-heavy software engineering agent that stays useful as a harness reference for tool bundles, configs, and benchmarkable runs.
SWE-agent/mini-swe-agent	Minimal baseline harness showing how far a simple bash-first loop can go without a giant scaffold.
clawdbot/clawdbot	OpenClaw's current repo; useful if you care about always-on personal agent control planes, skills, channels, and device actions.

2. Frameworks, Planning, Orchestration, and Agent Protocols

Project	Why it matters for agent harnesses
langchain-ai/langgraph	Low-level orchestration framework for long-running, stateful, controllable agents.
openai/openai-agents-python	Lightweight multi-agent workflow SDK with handoffs, sessions, tracing, and guardrails.
microsoft/autogen	Mature multi-agent framework with event-driven runtime and a large extension ecosystem.
crewAIInc/crewAI	Lean role-based multi-agent orchestration framework with a big ecosystem and many examples.
agno-agi/agno	Full-stack agent system with runtime, control plane, memory, knowledge, MCP, A2A, and eval hooks.
pydantic/pydantic-ai	Strong choice for typed, production-grade agent workflows with durable execution and approval hooks.
letta-ai/letta	Stateful agent platform focused on advanced memory and persistent agent identity over time.
lastmile-ai/mcp-agent	MCP-native framework that combines simple workflow patterns with durable execution.
a2aproject/A2A	Open Agent2Agent protocol for agent interoperability beyond tool calling.
anthropics/claude-agent-sdk-python	SDK for embedding Claude Agent / Claude Code style behavior into programmable workflows.

Emerging Planning, Workflow-Shaping, and Control-Plane Repos

These repos were previously listed under Emerging ~100-Star Harness Projects. They are now placed here because they shape planning, orchestration, repo instructions, or runtime control.

Project	Stars	Layer	Why it matters
trevor-nichols/agentrules-architect	109	Prompt scaffold / planning	Generates `AGENTS.md` / `CLAUDE.md` style rule files plus ExecPlan-oriented harness structure.
Codename-Inc/spectre	116	Workflow scaffold	Encodes `/Scope -> /Plan -> /Execute -> /Clean -> /Test -> /Evaluate` as a reusable coding workflow harness.
sudocode-ai/sudocode	248	Repo workflow layer	Slightly above the target range, but useful as a repo-local orchestration layer that lives with the codebase itself.
SethGammon/Citadel	125	Orchestration runtime	Claude Code team harness with routing, worktrees, lifecycle hooks, circuit breakers, and campaign persistence.
go-a2a/adk-go	99	Agent runtime / deployment	Go toolkit for building, evaluating, and deploying controlled agent systems.
Mercor-Intelligence/archipelago	134	Execution harness / eval	Harness for running and evaluating AI agents against RL environments.
jpicklyk/task-orchestrator	170	Task orchestration	Slightly above the target range, but notable for persistent work tracking and context storage across sessions and agents.

3. Skills and Reusable Behavior Packs

Skills are a first-class harness layer. They package reusable instructions, scripts, templates, and workflows that sit between raw prompts and external tools. In a real harness, skills help turn repeated behavior into portable, versioned, reviewable assets.

This section intentionally preserves the repository's historical Claude Skills catalog, but it now lives inside the Awesome Agent Harness map as the reusable behavior layer rather than a detached second theme.

Layer-Level Skill Repos and Behavior-Pack Upstreams

Project	Why it matters for agent harnesses
anthropics/skills	Official public repo for Claude skills, skill examples, templates, and the skill spec.
openai/skills	Codex-focused skills catalog; useful for understanding the cross-agent skills pattern.
vercel-labs/skills	Cross-agent skills CLI that installs skills into Codex, Claude Code, Cursor, OpenCode, OpenClaw, and more.
obra/superpowers	A full workflow system built from composable skills, plans, subagents, worktrees, and review loops.
trailofbits/skills	Security-heavy skill marketplace for audits, CodeQL, Semgrep, diff review, and secure dev workflows.
expo/skills	Official Expo team skill pack for building, deploying, and debugging Expo apps.

Why Skills Belong in a Harness

Skills employ a progressive disclosure architecture for efficiency:

Metadata loading (~100 tokens): Claude scans available Skills to identify relevant matches
Full instructions (<5k tokens): Load when Claude determines the Skill applies
Bundled resources: Files and executable code load only as needed

This design allows multiple Skills to remain available without overwhelming Claude's context window.

Getting Started with Skills

Claude.ai Web Interface

Go to Settings > Capabilities
Enable Skills toggle
Browse available skills or upload custom skills
For Team/Enterprise: Admin must enable Skills organization-wide first

Claude Code CLI

# Install skills from marketplace
/plugin marketplace add anthropics/skills

# Or install from local directory
/plugin add /path/to/skill-directory

Claude API

Skills are accessible via the /v1/skills API endpoint. See the Skills API documentation for detailed integration examples.

import anthropic

client = anthropic.Client(api_key="your-api-key")
# See API docs for full implementation details

Official Skills

Document Skills

Skills for working with complex file formats:

docx - Create, edit, and analyze Word documents with support for tracked changes, comments, formatting preservation, and text extraction
pdf - Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms
pptx - Create, edit, and analyze PowerPoint presentations with support for layouts, templates, charts, and automated slide generation
xlsx - Create, edit, and analyze Excel spreadsheets with support for formulas, formatting, data analysis, and visualization

Design and Creative

algorithmic-art - Create generative art using p5.js with seeded randomness, flow fields, and particle systems
canvas-design - Design beautiful visual art in .png and .pdf formats using design philosophies
slack-gif-creator - Create animated GIFs optimized for Slack's size constraints

Development

frontend-design - Instructs Claude to avoid "AI slop" or generic aesthetics and to make bold design decisions. Works very well for React & Tailwind.
web-artifacts-builder - Build complex claude.ai HTML artifacts using React, Tailwind CSS, and shadcn/ui components
mcp-builder - Guide for creating high-quality MCP servers to integrate external APIs and services
webapp-testing - Test local web applications using Playwright for UI verification and debugging
oh-my-claudecode - Multi-agent orchestration for Claude Code. Zero learning curve.
oh-my-codex - Start Codex stronger, then let OMX add better prompts, workflows, and runtime help when the work grows.
oh-my-openagent - for opencode
planning-with-files - Claude Code skill implementing Manus-style persistent markdown planning -- the workflow pattern behind the $2B acquisition.
ui-ux-pro-max-skill - An AI SKILL that provide design intelligence for building professional UI/UX multiple platforms
Code Review Plugin - Automated code review for pull requests using multiple specialized agents with confidence-based scoring to filter false positives.
code-simplifier -
ralph-loop -
mcp-builder -
skillsmp -

Communication

brand-guidelines - Apply Anthropic's official brand colors and typography to artifacts
internal-comms - Write internal communications like status reports, newsletters, and FAQs

Skill Creation

skill-creator - Interactive skill creation tool that guides you through building new skills with Q&A

Community Skills

Warning

Skills can execute arbitrary code in Claude's environment.

See Skills Security, Operations, and FAQ for more information.

Collections and Libraries

obra/superpowers - Core skills library for Claude Code with 20+ battle-tested skills including TDD, debugging, and collaboration patterns
- Features /brainstorm, /write-plan, /execute-plan commands and skills-search tool
- superpowers-skills - Community-editable skills repository
- Blog: Superpowers - Author's overview by Jesse Vincent
- Installation: /plugin marketplace add obra/superpowers-marketplace
obra/superpowers-lab - Experimental skills for Claude Code Superpowers (see above)
- Uses new techniques that are still being refined and tested (i.e. skills here may change over time)
- Blog post about its development
- Install from superpowers-marketplace plugin

Individual Skills

These will be broken down into categories once there are enough community skills available to list.

Skill	Description
ios-simulator-skill	iOS app building, navigation, and testing through automation
ffuf-web-fuzzing	Expert guidance for ffuf web fuzzing during penetration testing, including authenticated fuzzing with raw requests, auto-calibration, and result analysis
playwright-skill	General-purpose browser automation using Playwright
claude-d3js-skill	Visualizations in d3.js
claude-scientific-skills	Comprehensive collection of ready-to-use scientific skills, including working with specialized scientific libraries and databases
web-asset-generator	Generates web assets like favicons, app icons, and social media images
loki-mode	Multi-agent autonomous startup system - orchestrates 37 AI agents across 6 swarms to build, deploy, and operate a complete startup from PRD to revenue
Trail of Bits Security Skills	Security skills for static analysis with CodeQL/Semgrep, variant analysis, code auditing, and vulnerability detection
frontend-slides	Create animation-rich HTML presentations -- from scratch or by converting PowerPoint files
Expo Skills	Official skills by the Expo team for developing Expo apps
shadcn/ui	Give Claude Code context on shadcn components as well as pattern enforcement

More community skills coming soon! Submit a PR to add your skill.

Skill Tools and UIs

Tools

yusufkaraaslan/Skill_Seekers - Convert documentation websites into Claude Skills
claude-hud - A Claude Code plugin that shows what's happening -- context usage, active tools, running agents, and todo progress. Always visible below your input.
Claude-to-IM - Bridge Claude Code / Codex to IM platforms -- chat with AI coding agents from Telegram, Discord, Feishu/Lark, or QQ.

UI

CodePilot - A desktop GUI for Claude Code -- chat, code, and manage projects visually. Built with Electron + Next.js.

Creating and Publishing Skills

Step-by-Step Guide

Method 1: Use skill-creator (Recommended)

The easiest way to create a skill is to use the built-in skill-creator:

Enable the skill-creator skill in Claude
Ask Claude: "Use the skill-creator to help me build a skill for [your task]"
Answer the interactive questions about your workflow
Claude generates the complete skill structure for you

Method 2: Manual Creation

Create folder structure:

my-skill/
├── SKILL.md          # Main skill file with frontmatter
├── scripts/          # Optional executable scripts
│   └── helper.py
└── resources/        # Optional supporting files
    └── template.json

Create SKILL.md with frontmatter:

---
name: my-skill
description: Brief description for skill discovery (keep concise)
---

# Detailed Instructions

Claude will read these instructions when the skill is activated.

## Usage
Explain how to use this skill...

## Examples
Provide clear examples...

Add executable scripts (optional):
- Python, JavaScript, or other scripts Claude can execute
- Reference them in your SKILL.md instructions
Test locally:
- Install the skill in Claude Code or Claude Desktop
- Test with relevant tasks
- Iterate and refine
Share:
- Publish to GitHub
- Submit to this awesome list via PR
- Share with your team via git repos or internal distribution

Best Practices

Keep descriptions concise - The frontmatter description is used for skill discovery
Use clear, actionable instructions - Write instructions as if for a human collaborator
Include examples - Show specific examples in your SKILL.md
Version your skills - Use git tags for version management
Document dependencies - List any prerequisites or required packages
Test thoroughly - Verify your skill works across different scenarios

Skills Docs, Tutorials, and Articles

Official Documentation and Resources

Getting Started

What are Skills? - Official support article explaining Claude Skills
Using Skills in Claude - How to enable and use skills

Documentation

Claude Skills Announcement - Official announcement from Anthropic
Equipping Agents with Skills - Engineering deep dive on Agent Skills
Claude Developer Platform - Official documentation
Skills API Endpoint - /v1/skills API documentation

Repositories and Examples

anthropics/skills - Official public repository for Skills
Claude Cookbooks - Skills - Example notebooks and tutorials

Tutorials and Guides

Written Tutorials

How to Create Your First Claude Skill - Step-by-step tutorial with examples
How to Use Skills in Claude Code - Installation, project scoping, and testing guide

Video Tutorials

Video tutorials coming soon! Have a good video about Claude Skills? Submit a PR!

Example topics we'd love to see

Getting started with Claude Skills
Building your first custom skill
Skills vs MCP comparison
Enterprise deployment strategies

Articles and Blog Posts

Skills Explained - Official Anthropic blog post covering progressive disclosure, use cases, and when to use Skills vs other tools
Simon Willison: Claude Skills are awesome, maybe a bigger deal than MCP - Technical deep dive and analysis

Recent Updates in the Skills Ecosystem

November 2025

Nov 13: Anthropic publishes Skills Explained - Comprehensive guide covering progressive disclosure architecture, decision matrices for Skills vs Prompts/Subagents/Projects, and best practices

October 2025

Oct 18: Major community repositories emerge: obra/superpowers skills library
Oct 17: Community publishes practical tutorials on DEV.to and Medium
Oct 16: Claude Skills officially announced - Available across Claude.ai, Code, and API
Oct 16: Initial skills released including docx, pdf, pptx, xlsx, algorithmic-art, canvas-design, and more

Skills in the Harness Stack

Quick Reference: When to Use What

Tool	Best For
Skills	Reusable procedural knowledge across conversations
Prompts	One-time instructions and immediate context
Projects	Persistent background knowledge within workspaces
Subagents	Independent task execution with specific permissions
MCP	Connecting Claude to external data sources

Use Skills when: Capabilities should be accessible to any Claude instance. They're portable expertise.

Use Subagents when: You need self-contained agents designed for specific purposes with independent workflows and restricted tool access.

Combined approach: Subagents can leverage Skills for specialized expertise, merging independence with portable knowledge.

Key insight: If you find yourself typing the same prompt repeatedly across multiple conversations, it's time to create a Skill.

Skills vs MCP (Model Context Protocol)

Feature	Skills	MCP
Purpose	Task-specific expertise and workflows	External data/API integration
Portability	Same format everywhere (Claude.ai, Code, API)	Requires server configuration
Code Execution	Can include executable scripts	Provides tools/resources
Token Efficiency	30-50 tokens until loaded	Varies by implementation
Best For	Repeatable tasks, document workflows	Database access, API integrations

Use Together: Skills can create MCP servers! The mcp-builder skill helps build high-quality MCP integrations.

Skills vs System Prompts

Feature	Skills	System Prompts
Structure	Folder with YAML frontmatter, instructions, scripts	Plain text instructions
Reusability	Version-controlled, shareable, composable	Copy-paste, conversation-specific
Loading	On-demand (only when relevant)	Always in context
Maintenance	Centralized updates	Manual updates per conversation
Composability	Multiple skills stack automatically	Manual combination

Skills Security, Operations, and FAQ

Security and Best Practices

⚠️ Important: Skills can execute arbitrary code in Claude's environment. Only install skills from trusted sources.

Security Guidelines & Best Practices

Vetting Skills

Only install skills from trusted sources
Review SKILL.md and all scripts before enabling a skill
Be cautious of skills that request sensitive data access
Audit carefully before deploying to production or enterprise environments

Security Concerns

Malicious skills may introduce vulnerabilities or enable data exfiltration
Prompt injection attacks could be amplified through compromised skills
Sandboxing limitations - Understand the security model before enterprise deployment
Security research: Weaponizing Claude Code Skills - Analysis of potential security risks

Best Practices

Version control - Track all skills in git with proper version tags
Code review - Peer review custom skills before team distribution
Least privilege - Only grant necessary permissions and access
Regular audits - Periodically review installed skills
Documentation - Maintain clear documentation for custom skills
Testing - Thoroughly test skills in non-production environments first

Enterprise Considerations

As of October 2025, Claude.ai does not support centralized admin management for custom skills
Use version control and internal repositories for team skill distribution
Establish clear policies for skill vetting and approval
Monitor skill usage and performance

Troubleshooting

Known Issues & Common Problems

Known Issues

Linux path bug (Oct 18, 2025): Agent SDK uses hardcoded macOS paths instead of environment home directory
- Issue #268
- Workaround: Manually specify skill paths
Enterprise distribution: No centralized admin management yet for custom skills on claude.ai
- Use git repositories for team distribution
- API integration provides more control

Common Problems

Skills not appearing in Claude

Check Settings > Capabilities to ensure Skills are enabled
For Team/Enterprise: Verify admin has enabled Skills organization-wide
Restart Claude after installing new skills

Skills not loading/activating

Verify SKILL.md has proper YAML frontmatter format
Check that name and description fields are present
Ensure file structure matches expected format

Permission errors

Review admin settings for Team/Enterprise accounts
Check file permissions in skill directories
Verify API key has appropriate permissions

Skill execution failures

Check script dependencies are installed
Review error logs for specific issues
Test scripts independently outside of Claude

Getting Help

FAQ

Common Questions

Q: How much do skills impact token usage?

A: Skills are highly efficient thanks to progressive disclosure. Each skill uses only ~100 tokens during metadata scanning to determine relevance. When activated, the full skill content loads at <5k tokens. Bundled resources only load as needed.

Q: What's the difference between Claude Skills and Agent Skills?

A: They are the same thing.

Q: Can I share skills with my team?

A: Yes! Skills can be shared via:

Git repositories (recommended)
Internal file sharing
Claude API for programmatic distribution
Enterprise-wide deployment features (coming soon)

Q: Do skills work with all Claude models?

A: Skills are available for Pro, Max, Team, and Enterprise users. Free tier users do not have access to Skills.

Q: Can skills call external APIs?

A: Yes, skills can include scripts that call external APIs. For complex API integrations, consider using MCP (Model Context Protocol) alongside skills.

Q: How does Claude decide which skill to use?

A: Claude scans all available skills' frontmatter (name and description), evaluates relevance to the current task, then loads the full content of relevant skills. Multiple skills can be loaded and composed together automatically.

Q: Can I use Skills and MCP together?

A: Absolutely! They complement each other. Use Skills for task-specific workflows and MCP for external data/API integration. The mcp-builder skill can even help you build MCP servers.

Q: Are there any costs beyond my Claude subscription?

A: No additional costs for using official skills. Community and custom skills are free to use, though some may require external services (APIs, databases, etc.) that have their own costs.

Q: Can I monetize custom skills?

A: Currently, there is no official marketplace for paid skills. Anthropic has mentioned plans for community contributions and a potential marketplace in the future.

Q: How do I update a skill?

A: For skills from git repositories, pull the latest changes. For manually installed skills, replace the skill folder with the updated version. Always test updates in a non-production environment first.

4. MCP and Capability Fabric

Project	Why it matters for agent harnesses
modelcontextprotocol/modelcontextprotocol	The MCP spec and docs repo; the foundation for modern tool, resource, and prompt integration.
modelcontextprotocol/servers	Official reference servers plus the gateway into the wider MCP registry and ecosystem.
modelcontextprotocol/typescript-sdk	Official TypeScript SDK for writing MCP clients and servers.
modelcontextprotocol/python-sdk	Official Python SDK for MCP clients and servers.
github/github-mcp-server	The most important coding-focused MCP server: repos, files, issues, PRs, Actions, security, and more.
microsoft/playwright-mcp	Browser automation via MCP using accessibility snapshots instead of pixel-only interaction.
punkpeye/awesome-mcp-servers	Broad community index for MCP servers across every domain.
CodeAlive-AI/codealive-mcp	A strong example of an MCP-first context engine for large codebases.

Emerging Repos in This Layer

Project	Stars	Layer	Why it matters
Flux159/mcp-chat	134	MCP client / testing	Useful for testing and evaluating MCP servers and agent setups from the client side.
cs50victor/claude-code-teams-mcp	219	MCP orchestration	Above the target range, but a good example of using MCP to expose team orchestration patterns to harnesses.
amxv/mcp-manager	285	MCP management UI	Above the target range, but increasingly relevant as harnesses need GUI-level MCP fleet management.

5. Memory, State, and Context Systems

Project	Why it matters for agent harnesses
letta-ai/letta	Best viewed as a memory-first platform for persistent, stateful agents.
mem0ai/mem0	Universal memory layer for user, session, and agent state.
getzep/graphiti	Real-time knowledge graphs for agent memory, retrieval, and historical reasoning; also includes an MCP server.
langchain-ai/langgraph	Important here for checkpointing, resumability, and explicit state graphs.
CodeAlive-AI/codealive-mcp	Worth revisiting here as a context-engine layer rather than just an MCP endpoint.

Emerging Repos in This Layer

Project	Stars	Layer	Why it matters
srikanthbellary/openstinger	114	Memory harness	Explicitly positions itself as a portable memory harness for agents.
aayoawoyemi/Ori-Mnemos	139	Persistent memory	Local-first persistent memory system built specifically for agentic workflows.
AGI-is-going-to-arrive/Memory-Palace	211	Long-term memory OS	Above the target range, but highly relevant as a memory operating system concept for AI agents.

6. Browser, Sandbox, Execution, and Operator Surfaces

Project	Why it matters for agent harnesses
browser-use/browser-use	Makes websites accessible to agents; useful for end-to-end verification and browser action loops.
e2b-dev/E2B	Secure isolated cloud sandboxes for running AI-generated code.
SWE-agent/SWE-ReX	Runtime interface for sandboxed shell execution, local or remote, with strong parallelization support.
microsoft/playwright-mcp	Also belongs here as the cleanest browser substrate for MCP-native agents.
clawdbot/clawdbot	Interesting if your harness needs channels, device actions, voice, and always-on local control.

Emerging Repos in This Layer

Project	Stars	Layer	Why it matters
llm-platform-security/SecGPT	109	Isolation architecture	Focuses on execution isolation for LLM-based agent systems.
Cloudgeni-ai/infrastructure-agents-guide	125	Safe operations	Guide repo centered on architecture, sandboxing, credentials, change control, and observability for infra agents.
PACHAKUTlQ/ClaudeCage	138	Sandboxed runtime	Portable sandbox wrapper for Claude Code style workflows.
mattolson/agent-sandbox	157	Local sandbox	Local secure dev environment for agent collaboration.
OpenSource03/harnss	147	Desktop harness shell	Desktop UI for Claude Code, Codex, and ACP-compatible agents with terminal, browser, Git, and MCP visualization.
EDEAI/OpenFlux	173	Desktop agent client	Slightly above the target range, but notable for long-term memory, browser automation, and tool orchestration in a local client.

7. Observability, Evals, and Guardrails

Project	Why it matters for agent harnesses
langfuse/langfuse	Full LLM engineering platform for tracing, evals, prompts, datasets, and production debugging.
Arize-ai/phoenix	Open-source observability and evaluation platform for tracing and troubleshooting agent runs.
promptfoo/promptfoo	Practical eval, CI, red-teaming, and vulnerability scanning for prompts, RAG, and agents.
truera/trulens	Evaluation and tracking framework for LLM applications and agents.
invariantlabs-ai/invariant	Rule-based guardrails layer that can sit between your app and MCP or LLM providers.
invariantlabs-ai/mcp-scan	MCP-specific security scanner and proxy for inspection, logging, and runtime enforcement.

Emerging Repos in This Layer

Project	Stars	Layer	Why it matters
METR/vivaria	135	Evaluation harness	METR's evaluation and elicitation research tooling; highly relevant for serious agent evaluation.
scabench-org/scabench	105	Audit-agent eval	Framework for evaluating AI audit agents on recent real-world data.
philschmid/ai-agent-benchmark-compendium	112	Benchmark index	Curated benchmark map for agent evaluation across coding, tool use, reasoning, and computer interaction.
arklexai/arksim	112	Error simulation / eval	Helps surface agent failures before they hit real users.
Mengmeara/agent-safe-probe-x	83	Safety evaluation	Focused framework for automated safety evaluation of intelligent agents.

8. Reference Harness Compositions

These are the former Suggested Build Stacks, reframed as cross-layer harness compositions. Each composition combines multiple layers from the map above rather than acting like an isolated appendix.

1. Minimal Coding Harness

Best when you want the smallest useful end-to-end coding loop.

Runtime layer: openai/codex or anomalyco/opencode
Capability layer: github/github-mcp-server
Browser / execution layer: microsoft/playwright-mcp
Eval layer: promptfoo/promptfoo
Observability layer: langfuse/langfuse

2. MCP-Native Harness

Best when MCP is your primary capability fabric and you want the rest of the harness to follow that design.

Runtime / framework layer: langchain-ai/deepagents or lastmile-ai/mcp-agent
Protocol layer: modelcontextprotocol/modelcontextprotocol
Registry layer: modelcontextprotocol/servers
Memory layer: mem0ai/mem0 or getzep/graphiti
Observability / eval layer: Arize-ai/phoenix

3. Security-First Harness

Best when approval, isolation, scanning, and secure workflows matter more than raw autonomy.

Runtime layer: cline/cline or block/goose
Skills / behavior layer: trailofbits/skills
Guardrails layer: invariantlabs-ai/invariant
MCP inspection layer: invariantlabs-ai/mcp-scan
Eval layer: promptfoo/promptfoo

Related Research in This Repo

research/agent-harness-architecture-2026.md - architecture blueprint and implementation guidance
research/model-is-not-key-harness-is.md - Chinese synthesis of the 2026 harness engineering shift
research/deep-research-report.md - supporting research notes

Contributing

PRs are welcome. Please prefer:

official upstream repos over mirrors
registries over long-tail one-off entries when a category is exploding
concise descriptions that explain why a repo is harness-related
exact links to GitHub repos, not marketing pages

If you are submitting a Skill, keep the skill catalog conventions intact and place the new entry where it best fits within the Skills and Reusable Behavior Packs layer.

See CONTRIBUTING.md for submission guidance.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Awesome Agent Harness

Quick Navigation

Scope

Search Method

Selection Principles

If You Only Read Five Repos

Harness Layer Map

1. Harness-First Runtimes and Coding Agents

2. Frameworks, Planning, Orchestration, and Agent Protocols

Emerging Planning, Workflow-Shaping, and Control-Plane Repos

3. Skills and Reusable Behavior Packs

Layer-Level Skill Repos and Behavior-Pack Upstreams

Why Skills Belong in a Harness

Getting Started with Skills

Claude.ai Web Interface

Claude Code CLI

Claude API

Official Skills

Document Skills

Design and Creative

Development

Communication

Skill Creation

Community Skills

Collections and Libraries

Individual Skills

Skill Tools and UIs

Tools

UI

Creating and Publishing Skills

Method 1: Use skill-creator (Recommended)

Method 2: Manual Creation

Best Practices

Skills Docs, Tutorials, and Articles

Official Documentation and Resources

Getting Started

Documentation

Repositories and Examples

Tutorials and Guides

Written Tutorials

Video Tutorials

Articles and Blog Posts

Recent Updates in the Skills Ecosystem

November 2025

October 2025

Skills in the Harness Stack

Quick Reference: When to Use What

Skills vs MCP (Model Context Protocol)

Skills vs System Prompts

Skills Security, Operations, and FAQ

Security and Best Practices

Vetting Skills

Security Concerns

Best Practices

Enterprise Considerations

Troubleshooting

Known Issues

Common Problems

Getting Help

FAQ

4. MCP and Capability Fabric

Emerging Repos in This Layer

5. Memory, State, and Context Systems

Emerging Repos in This Layer

6. Browser, Sandbox, Execution, and Operator Surfaces

Emerging Repos in This Layer

7. Observability, Evals, and Guardrails

Emerging Repos in This Layer

8. Reference Harness Compositions

1. Minimal Coding Harness

2. MCP-Native Harness

3. Security-First Harness

Related Research in This Repo

Contributing

About

Resources

Packages