Skip to content

wangxumarshall/awesome-agent-harness

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 

Repository files navigation

Awesome Agent Harness

Awesome Last Updated PRs Welcome

A curated GitHub map of projects for building, operating, and governing agent harnesses, with skills preserved and integrated as a first-class harness layer.

An agent harness is the control plane around an agent:

  • prompt and context scaffolding
  • planning, state, checkpoints, and recovery
  • tool and protocol integration
  • isolated execution environments
  • verification, reviewability, and guardrails
  • reusable skills and behavior packs

This repository started life as a Claude Skills catalog. That history remains intact here, but it is now organized as part of a single Awesome Agent Harness map: skills are treated as a core harness layer rather than a detached appendix.

Quick Navigation

Scope

This list focuses on high-signal, open-source GitHub projects that are directly useful when building an agent harness.

It does not attempt a mathematically exhaustive list of every repo in the ecosystem. That is no longer practical, especially for:

  • MCP servers
  • individual skill packs
  • one-off app wrappers around existing agents

For long-tail discovery, this README points to registries and official indexes.

Search Method

GitHub-wide search terms used in this sweep included:

agent harness
harness engineering
coding agent
agent runtime
stateful agent
model context protocol
mcp server
agent skills
browser agent
agent eval
agent guardrails
secure agent sandbox

Notes:

  • This sweep was updated on March 22, 2026.
  • Star counts in the ~100-star layer notes below are GitHub API snapshots from March 22, 2026 and will change over time.
  • For this README, ~100 stars means roughly 80-160 stars, because exact 100-star filtering is too unstable to be useful in practice.
  • As of March 22, 2026, OpenDevin/OpenDevin has effectively moved to All-Hands-AI/OpenHands.
  • As of March 22, 2026, openclaw/openclaw resolves to clawdbot/clawdbot.

Selection Principles

Projects are prioritized when they do one or more of the following:

  • implement a full harness runtime
  • expose a critical harness subsystem such as MCP, memory, sandboxing, evals, or skills
  • act as an official spec, SDK, or registry
  • provide patterns that are widely reusable across agent stacks

Projects are deprioritized when they are:

  • thin wrappers around proprietary services with little reusable harness value
  • abandoned experiments with no clear ecosystem relevance
  • one-off MCP servers that are better represented by a registry

If You Only Read Five Repos

Harness Layer Map

This README is organized by harness layers rather than by vendor brand or historical repository state.

The main layers in this map are:

  1. Harness-First Runtimes and Coding Agents: agent shells, coding agents, and repo-native execution loops
  2. Frameworks, Planning, Orchestration, and Agent Protocols: orchestration frameworks, workflow shapers, and agent interoperability layers
  3. Skills and Reusable Behavior Packs: portable instructions, workflow packs, community skills, creation guides, and skill operations
  4. MCP and Capability Fabric: tool protocols, MCP specs, SDKs, registries, and capability management
  5. Memory, State, and Context Systems: persistent memory, state graphs, and long-running context layers
  6. Browser, Sandbox, Execution, and Operator Surfaces: browser control, isolated execution, safe sandboxes, and operator-facing shells
  7. Observability, Evals, and Guardrails: tracing, evals, reviewability, policy, and safety tooling
  8. Reference Harness Compositions: cross-layer build stacks that combine the layers above into practical systems

The former standalone ~100-star section has been redistributed into the relevant layers below as Emerging Repos in This Layer, so the whole document stays centered on Awesome Agent Harness.

1. Harness-First Runtimes and Coding Agents

Project Why it matters for agent harnesses
openai/codex Lightweight coding agent for the terminal; a strong reference for CLI-native harness design and repo-driven workflows.
langchain-ai/deepagents One of the clearest open-source agent harness repos: planning, filesystem, shell, subagents, summarization, MCP, and HITL.
anomalyco/opencode Open-source coding agent with built-in build and plan modes, provider-agnostic design, and strong TUI ergonomics.
All-Hands-AI/OpenHands End-to-end software engineering agent platform; useful as a larger harness/control-plane reference.
Aider-AI/aider Terminal pair-programming agent; a mature reference for repo-aware edit/apply/test loops.
continuedev/continue Open-source CLI and IDE agent system with TUI and headless modes for background workflows.
cline/cline IDE-native autonomous coding agent with explicit human approval, browser use, checkpoints, and MCP extension points.
block/goose Extensible local agent that can install, execute, edit, and test with any LLM; good for MCP-heavy local harness patterns.
SWE-agent/SWE-agent Research-heavy software engineering agent that stays useful as a harness reference for tool bundles, configs, and benchmarkable runs.
SWE-agent/mini-swe-agent Minimal baseline harness showing how far a simple bash-first loop can go without a giant scaffold.
clawdbot/clawdbot OpenClaw's current repo; useful if you care about always-on personal agent control planes, skills, channels, and device actions.

2. Frameworks, Planning, Orchestration, and Agent Protocols

Project Why it matters for agent harnesses
langchain-ai/langgraph Low-level orchestration framework for long-running, stateful, controllable agents.
openai/openai-agents-python Lightweight multi-agent workflow SDK with handoffs, sessions, tracing, and guardrails.
microsoft/autogen Mature multi-agent framework with event-driven runtime and a large extension ecosystem.
crewAIInc/crewAI Lean role-based multi-agent orchestration framework with a big ecosystem and many examples.
agno-agi/agno Full-stack agent system with runtime, control plane, memory, knowledge, MCP, A2A, and eval hooks.
pydantic/pydantic-ai Strong choice for typed, production-grade agent workflows with durable execution and approval hooks.
letta-ai/letta Stateful agent platform focused on advanced memory and persistent agent identity over time.
lastmile-ai/mcp-agent MCP-native framework that combines simple workflow patterns with durable execution.
a2aproject/A2A Open Agent2Agent protocol for agent interoperability beyond tool calling.
anthropics/claude-agent-sdk-python SDK for embedding Claude Agent / Claude Code style behavior into programmable workflows.

Emerging Planning, Workflow-Shaping, and Control-Plane Repos

These repos were previously listed under Emerging ~100-Star Harness Projects. They are now placed here because they shape planning, orchestration, repo instructions, or runtime control.

Project Stars Layer Why it matters
trevor-nichols/agentrules-architect 109 Prompt scaffold / planning Generates AGENTS.md / CLAUDE.md style rule files plus ExecPlan-oriented harness structure.
Codename-Inc/spectre 116 Workflow scaffold Encodes /Scope -> /Plan -> /Execute -> /Clean -> /Test -> /Evaluate as a reusable coding workflow harness.
sudocode-ai/sudocode 248 Repo workflow layer Slightly above the target range, but useful as a repo-local orchestration layer that lives with the codebase itself.
SethGammon/Citadel 125 Orchestration runtime Claude Code team harness with routing, worktrees, lifecycle hooks, circuit breakers, and campaign persistence.
go-a2a/adk-go 99 Agent runtime / deployment Go toolkit for building, evaluating, and deploying controlled agent systems.
Mercor-Intelligence/archipelago 134 Execution harness / eval Harness for running and evaluating AI agents against RL environments.
jpicklyk/task-orchestrator 170 Task orchestration Slightly above the target range, but notable for persistent work tracking and context storage across sessions and agents.

3. Skills and Reusable Behavior Packs

Skills are a first-class harness layer. They package reusable instructions, scripts, templates, and workflows that sit between raw prompts and external tools. In a real harness, skills help turn repeated behavior into portable, versioned, reviewable assets.

This section intentionally preserves the repository's historical Claude Skills catalog, but it now lives inside the Awesome Agent Harness map as the reusable behavior layer rather than a detached second theme.

Layer-Level Skill Repos and Behavior-Pack Upstreams

Project Why it matters for agent harnesses
anthropics/skills Official public repo for Claude skills, skill examples, templates, and the skill spec.
openai/skills Codex-focused skills catalog; useful for understanding the cross-agent skills pattern.
vercel-labs/skills Cross-agent skills CLI that installs skills into Codex, Claude Code, Cursor, OpenCode, OpenClaw, and more.
obra/superpowers A full workflow system built from composable skills, plans, subagents, worktrees, and review loops.
trailofbits/skills Security-heavy skill marketplace for audits, CodeQL, Semgrep, diff review, and secure dev workflows.
expo/skills Official Expo team skill pack for building, deploying, and debugging Expo apps.

Why Skills Belong in a Harness

Skills employ a progressive disclosure architecture for efficiency:

  1. Metadata loading (~100 tokens): Claude scans available Skills to identify relevant matches
  2. Full instructions (<5k tokens): Load when Claude determines the Skill applies
  3. Bundled resources: Files and executable code load only as needed

This design allows multiple Skills to remain available without overwhelming Claude's context window.

Getting Started with Skills

Claude.ai Web Interface

  1. Go to Settings > Capabilities
  2. Enable Skills toggle
  3. Browse available skills or upload custom skills
  4. For Team/Enterprise: Admin must enable Skills organization-wide first

Claude Code CLI

# Install skills from marketplace
/plugin marketplace add anthropics/skills

# Or install from local directory
/plugin add /path/to/skill-directory

Claude API

Skills are accessible via the /v1/skills API endpoint. See the Skills API documentation for detailed integration examples.

import anthropic

client = anthropic.Client(api_key="your-api-key")
# See API docs for full implementation details

Official Skills

Document Skills

Skills for working with complex file formats:

  • docx - Create, edit, and analyze Word documents with support for tracked changes, comments, formatting preservation, and text extraction
  • pdf - Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms
  • pptx - Create, edit, and analyze PowerPoint presentations with support for layouts, templates, charts, and automated slide generation
  • xlsx - Create, edit, and analyze Excel spreadsheets with support for formulas, formatting, data analysis, and visualization

Design and Creative

  • algorithmic-art - Create generative art using p5.js with seeded randomness, flow fields, and particle systems
  • canvas-design - Design beautiful visual art in .png and .pdf formats using design philosophies
  • slack-gif-creator - Create animated GIFs optimized for Slack's size constraints

Development

  • frontend-design - Instructs Claude to avoid "AI slop" or generic aesthetics and to make bold design decisions. Works very well for React & Tailwind.
  • web-artifacts-builder - Build complex claude.ai HTML artifacts using React, Tailwind CSS, and shadcn/ui components
  • mcp-builder - Guide for creating high-quality MCP servers to integrate external APIs and services
  • webapp-testing - Test local web applications using Playwright for UI verification and debugging
  • oh-my-claudecode - Multi-agent orchestration for Claude Code. Zero learning curve.
  • oh-my-codex - Start Codex stronger, then let OMX add better prompts, workflows, and runtime help when the work grows.
  • oh-my-openagent - for opencode
  • planning-with-files - Claude Code skill implementing Manus-style persistent markdown planning -- the workflow pattern behind the $2B acquisition.
  • ui-ux-pro-max-skill - An AI SKILL that provide design intelligence for building professional UI/UX multiple platforms
  • Code Review Plugin - Automated code review for pull requests using multiple specialized agents with confidence-based scoring to filter false positives.
  • code-simplifier -
  • ralph-loop -
  • mcp-builder -
  • skillsmp -

Communication

  • brand-guidelines - Apply Anthropic's official brand colors and typography to artifacts
  • internal-comms - Write internal communications like status reports, newsletters, and FAQs

Skill Creation

  • skill-creator - Interactive skill creation tool that guides you through building new skills with Q&A

Community Skills

Warning

Skills can execute arbitrary code in Claude's environment.

See Skills Security, Operations, and FAQ for more information.

Collections and Libraries

  • obra/superpowers - Core skills library for Claude Code with 20+ battle-tested skills including TDD, debugging, and collaboration patterns
    • Features /brainstorm, /write-plan, /execute-plan commands and skills-search tool
    • superpowers-skills - Community-editable skills repository
    • Blog: Superpowers - Author's overview by Jesse Vincent
    • Installation: /plugin marketplace add obra/superpowers-marketplace
  • obra/superpowers-lab - Experimental skills for Claude Code Superpowers (see above)
    • Uses new techniques that are still being refined and tested (i.e. skills here may change over time)
    • Blog post about its development
    • Install from superpowers-marketplace plugin

Individual Skills

These will be broken down into categories once there are enough community skills available to list.

Skill Description
ios-simulator-skill iOS app building, navigation, and testing through automation
ffuf-web-fuzzing Expert guidance for ffuf web fuzzing during penetration testing, including authenticated fuzzing with raw requests, auto-calibration, and result analysis
playwright-skill General-purpose browser automation using Playwright
claude-d3js-skill Visualizations in d3.js
claude-scientific-skills Comprehensive collection of ready-to-use scientific skills, including working with specialized scientific libraries and databases
web-asset-generator Generates web assets like favicons, app icons, and social media images
loki-mode Multi-agent autonomous startup system - orchestrates 37 AI agents across 6 swarms to build, deploy, and operate a complete startup from PRD to revenue
Trail of Bits Security Skills Security skills for static analysis with CodeQL/Semgrep, variant analysis, code auditing, and vulnerability detection
frontend-slides Create animation-rich HTML presentations -- from scratch or by converting PowerPoint files
Expo Skills Official skills by the Expo team for developing Expo apps
shadcn/ui Give Claude Code context on shadcn components as well as pattern enforcement

More community skills coming soon! Submit a PR to add your skill.

Skill Tools and UIs

Tools

  • yusufkaraaslan/Skill_Seekers - Convert documentation websites into Claude Skills
  • claude-hud - A Claude Code plugin that shows what's happening -- context usage, active tools, running agents, and todo progress. Always visible below your input.
  • Claude-to-IM - Bridge Claude Code / Codex to IM platforms -- chat with AI coding agents from Telegram, Discord, Feishu/Lark, or QQ.

UI

  • CodePilot - A desktop GUI for Claude Code -- chat, code, and manage projects visually. Built with Electron + Next.js.

Creating and Publishing Skills

Step-by-Step Guide

Method 1: Use skill-creator (Recommended)

The easiest way to create a skill is to use the built-in skill-creator:

  1. Enable the skill-creator skill in Claude
  2. Ask Claude: "Use the skill-creator to help me build a skill for [your task]"
  3. Answer the interactive questions about your workflow
  4. Claude generates the complete skill structure for you

Method 2: Manual Creation

  1. Create folder structure:

    my-skill/
    ├── SKILL.md          # Main skill file with frontmatter
    ├── scripts/          # Optional executable scripts
    │   └── helper.py
    └── resources/        # Optional supporting files
        └── template.json
    
  2. Create SKILL.md with frontmatter:

    ---
    name: my-skill
    description: Brief description for skill discovery (keep concise)
    ---
    
    # Detailed Instructions
    
    Claude will read these instructions when the skill is activated.
    
    ## Usage
    Explain how to use this skill...
    
    ## Examples
    Provide clear examples...
  3. Add executable scripts (optional):

    • Python, JavaScript, or other scripts Claude can execute
    • Reference them in your SKILL.md instructions
  4. Test locally:

    • Install the skill in Claude Code or Claude Desktop
    • Test with relevant tasks
    • Iterate and refine
  5. Share:

    • Publish to GitHub
    • Submit to this awesome list via PR
    • Share with your team via git repos or internal distribution

Best Practices

  • Keep descriptions concise - The frontmatter description is used for skill discovery
  • Use clear, actionable instructions - Write instructions as if for a human collaborator
  • Include examples - Show specific examples in your SKILL.md
  • Version your skills - Use git tags for version management
  • Document dependencies - List any prerequisites or required packages
  • Test thoroughly - Verify your skill works across different scenarios

Skills Docs, Tutorials, and Articles

Official Documentation and Resources

Getting Started
Documentation
Repositories and Examples

Tutorials and Guides

Written Tutorials
Video Tutorials

Video tutorials coming soon! Have a good video about Claude Skills? Submit a PR!

Example topics we'd love to see
  • Getting started with Claude Skills
  • Building your first custom skill
  • Skills vs MCP comparison
  • Enterprise deployment strategies

Articles and Blog Posts

Recent Updates in the Skills Ecosystem

November 2025
  • Nov 13: Anthropic publishes Skills Explained - Comprehensive guide covering progressive disclosure architecture, decision matrices for Skills vs Prompts/Subagents/Projects, and best practices
October 2025
  • Oct 18: Major community repositories emerge: obra/superpowers skills library
  • Oct 17: Community publishes practical tutorials on DEV.to and Medium
  • Oct 16: Claude Skills officially announced - Available across Claude.ai, Code, and API
  • Oct 16: Initial skills released including docx, pdf, pptx, xlsx, algorithmic-art, canvas-design, and more

Skills in the Harness Stack

Quick Reference: When to Use What

Tool Best For
Skills Reusable procedural knowledge across conversations
Prompts One-time instructions and immediate context
Projects Persistent background knowledge within workspaces
Subagents Independent task execution with specific permissions
MCP Connecting Claude to external data sources

Use Skills when: Capabilities should be accessible to any Claude instance. They're portable expertise.

Use Subagents when: You need self-contained agents designed for specific purposes with independent workflows and restricted tool access.

Combined approach: Subagents can leverage Skills for specialized expertise, merging independence with portable knowledge.

Key insight: If you find yourself typing the same prompt repeatedly across multiple conversations, it's time to create a Skill.

Skills vs MCP (Model Context Protocol)

Feature Skills MCP
Purpose Task-specific expertise and workflows External data/API integration
Portability Same format everywhere (Claude.ai, Code, API) Requires server configuration
Code Execution Can include executable scripts Provides tools/resources
Token Efficiency 30-50 tokens until loaded Varies by implementation
Best For Repeatable tasks, document workflows Database access, API integrations

Use Together: Skills can create MCP servers! The mcp-builder skill helps build high-quality MCP integrations.

Skills vs System Prompts

Feature Skills System Prompts
Structure Folder with YAML frontmatter, instructions, scripts Plain text instructions
Reusability Version-controlled, shareable, composable Copy-paste, conversation-specific
Loading On-demand (only when relevant) Always in context
Maintenance Centralized updates Manual updates per conversation
Composability Multiple skills stack automatically Manual combination

Skills Security, Operations, and FAQ

Security and Best Practices

⚠️ Important: Skills can execute arbitrary code in Claude's environment. Only install skills from trusted sources.

Security Guidelines & Best Practices
Vetting Skills
  • Only install skills from trusted sources
  • Review SKILL.md and all scripts before enabling a skill
  • Be cautious of skills that request sensitive data access
  • Audit carefully before deploying to production or enterprise environments
Security Concerns
  • Malicious skills may introduce vulnerabilities or enable data exfiltration
  • Prompt injection attacks could be amplified through compromised skills
  • Sandboxing limitations - Understand the security model before enterprise deployment
  • Security research: Weaponizing Claude Code Skills - Analysis of potential security risks
Best Practices
  • Version control - Track all skills in git with proper version tags
  • Code review - Peer review custom skills before team distribution
  • Least privilege - Only grant necessary permissions and access
  • Regular audits - Periodically review installed skills
  • Documentation - Maintain clear documentation for custom skills
  • Testing - Thoroughly test skills in non-production environments first
Enterprise Considerations
  • As of October 2025, Claude.ai does not support centralized admin management for custom skills
  • Use version control and internal repositories for team skill distribution
  • Establish clear policies for skill vetting and approval
  • Monitor skill usage and performance

Troubleshooting

Known Issues & Common Problems
Known Issues
  • Linux path bug (Oct 18, 2025): Agent SDK uses hardcoded macOS paths instead of environment home directory
    • Issue #268
    • Workaround: Manually specify skill paths
  • Enterprise distribution: No centralized admin management yet for custom skills on claude.ai
    • Use git repositories for team distribution
    • API integration provides more control
Common Problems

Skills not appearing in Claude

  • Check Settings > Capabilities to ensure Skills are enabled
  • For Team/Enterprise: Verify admin has enabled Skills organization-wide
  • Restart Claude after installing new skills

Skills not loading/activating

  • Verify SKILL.md has proper YAML frontmatter format
  • Check that name and description fields are present
  • Ensure file structure matches expected format

Permission errors

  • Review admin settings for Team/Enterprise accounts
  • Check file permissions in skill directories
  • Verify API key has appropriate permissions

Skill execution failures

  • Check script dependencies are installed
  • Review error logs for specific issues
  • Test scripts independently outside of Claude
Getting Help

FAQ

Common Questions

Q: How much do skills impact token usage?

A: Skills are highly efficient thanks to progressive disclosure. Each skill uses only ~100 tokens during metadata scanning to determine relevance. When activated, the full skill content loads at <5k tokens. Bundled resources only load as needed.

Q: What's the difference between Claude Skills and Agent Skills?

A: They are the same thing.

Q: Can I share skills with my team?

A: Yes! Skills can be shared via:

  • Git repositories (recommended)
  • Internal file sharing
  • Claude API for programmatic distribution
  • Enterprise-wide deployment features (coming soon)

Q: Do skills work with all Claude models?

A: Skills are available for Pro, Max, Team, and Enterprise users. Free tier users do not have access to Skills.

Q: Can skills call external APIs?

A: Yes, skills can include scripts that call external APIs. For complex API integrations, consider using MCP (Model Context Protocol) alongside skills.

Q: How does Claude decide which skill to use?

A: Claude scans all available skills' frontmatter (name and description), evaluates relevance to the current task, then loads the full content of relevant skills. Multiple skills can be loaded and composed together automatically.

Q: Can I use Skills and MCP together?

A: Absolutely! They complement each other. Use Skills for task-specific workflows and MCP for external data/API integration. The mcp-builder skill can even help you build MCP servers.

Q: Are there any costs beyond my Claude subscription?

A: No additional costs for using official skills. Community and custom skills are free to use, though some may require external services (APIs, databases, etc.) that have their own costs.

Q: Can I monetize custom skills?

A: Currently, there is no official marketplace for paid skills. Anthropic has mentioned plans for community contributions and a potential marketplace in the future.

Q: How do I update a skill?

A: For skills from git repositories, pull the latest changes. For manually installed skills, replace the skill folder with the updated version. Always test updates in a non-production environment first.

4. MCP and Capability Fabric

Project Why it matters for agent harnesses
modelcontextprotocol/modelcontextprotocol The MCP spec and docs repo; the foundation for modern tool, resource, and prompt integration.
modelcontextprotocol/servers Official reference servers plus the gateway into the wider MCP registry and ecosystem.
modelcontextprotocol/typescript-sdk Official TypeScript SDK for writing MCP clients and servers.
modelcontextprotocol/python-sdk Official Python SDK for MCP clients and servers.
github/github-mcp-server The most important coding-focused MCP server: repos, files, issues, PRs, Actions, security, and more.
microsoft/playwright-mcp Browser automation via MCP using accessibility snapshots instead of pixel-only interaction.
punkpeye/awesome-mcp-servers Broad community index for MCP servers across every domain.
CodeAlive-AI/codealive-mcp A strong example of an MCP-first context engine for large codebases.

Emerging Repos in This Layer

Project Stars Layer Why it matters
Flux159/mcp-chat 134 MCP client / testing Useful for testing and evaluating MCP servers and agent setups from the client side.
cs50victor/claude-code-teams-mcp 219 MCP orchestration Above the target range, but a good example of using MCP to expose team orchestration patterns to harnesses.
amxv/mcp-manager 285 MCP management UI Above the target range, but increasingly relevant as harnesses need GUI-level MCP fleet management.

5. Memory, State, and Context Systems

Project Why it matters for agent harnesses
letta-ai/letta Best viewed as a memory-first platform for persistent, stateful agents.
mem0ai/mem0 Universal memory layer for user, session, and agent state.
getzep/graphiti Real-time knowledge graphs for agent memory, retrieval, and historical reasoning; also includes an MCP server.
langchain-ai/langgraph Important here for checkpointing, resumability, and explicit state graphs.
CodeAlive-AI/codealive-mcp Worth revisiting here as a context-engine layer rather than just an MCP endpoint.

Emerging Repos in This Layer

Project Stars Layer Why it matters
srikanthbellary/openstinger 114 Memory harness Explicitly positions itself as a portable memory harness for agents.
aayoawoyemi/Ori-Mnemos 139 Persistent memory Local-first persistent memory system built specifically for agentic workflows.
AGI-is-going-to-arrive/Memory-Palace 211 Long-term memory OS Above the target range, but highly relevant as a memory operating system concept for AI agents.

6. Browser, Sandbox, Execution, and Operator Surfaces

Project Why it matters for agent harnesses
browser-use/browser-use Makes websites accessible to agents; useful for end-to-end verification and browser action loops.
e2b-dev/E2B Secure isolated cloud sandboxes for running AI-generated code.
SWE-agent/SWE-ReX Runtime interface for sandboxed shell execution, local or remote, with strong parallelization support.
microsoft/playwright-mcp Also belongs here as the cleanest browser substrate for MCP-native agents.
clawdbot/clawdbot Interesting if your harness needs channels, device actions, voice, and always-on local control.

Emerging Repos in This Layer

Project Stars Layer Why it matters
llm-platform-security/SecGPT 109 Isolation architecture Focuses on execution isolation for LLM-based agent systems.
Cloudgeni-ai/infrastructure-agents-guide 125 Safe operations Guide repo centered on architecture, sandboxing, credentials, change control, and observability for infra agents.
PACHAKUTlQ/ClaudeCage 138 Sandboxed runtime Portable sandbox wrapper for Claude Code style workflows.
mattolson/agent-sandbox 157 Local sandbox Local secure dev environment for agent collaboration.
OpenSource03/harnss 147 Desktop harness shell Desktop UI for Claude Code, Codex, and ACP-compatible agents with terminal, browser, Git, and MCP visualization.
EDEAI/OpenFlux 173 Desktop agent client Slightly above the target range, but notable for long-term memory, browser automation, and tool orchestration in a local client.

7. Observability, Evals, and Guardrails

Project Why it matters for agent harnesses
langfuse/langfuse Full LLM engineering platform for tracing, evals, prompts, datasets, and production debugging.
Arize-ai/phoenix Open-source observability and evaluation platform for tracing and troubleshooting agent runs.
promptfoo/promptfoo Practical eval, CI, red-teaming, and vulnerability scanning for prompts, RAG, and agents.
truera/trulens Evaluation and tracking framework for LLM applications and agents.
invariantlabs-ai/invariant Rule-based guardrails layer that can sit between your app and MCP or LLM providers.
invariantlabs-ai/mcp-scan MCP-specific security scanner and proxy for inspection, logging, and runtime enforcement.

Emerging Repos in This Layer

Project Stars Layer Why it matters
METR/vivaria 135 Evaluation harness METR's evaluation and elicitation research tooling; highly relevant for serious agent evaluation.
scabench-org/scabench 105 Audit-agent eval Framework for evaluating AI audit agents on recent real-world data.
philschmid/ai-agent-benchmark-compendium 112 Benchmark index Curated benchmark map for agent evaluation across coding, tool use, reasoning, and computer interaction.
arklexai/arksim 112 Error simulation / eval Helps surface agent failures before they hit real users.
Mengmeara/agent-safe-probe-x 83 Safety evaluation Focused framework for automated safety evaluation of intelligent agents.

8. Reference Harness Compositions

These are the former Suggested Build Stacks, reframed as cross-layer harness compositions. Each composition combines multiple layers from the map above rather than acting like an isolated appendix.

1. Minimal Coding Harness

Best when you want the smallest useful end-to-end coding loop.

2. MCP-Native Harness

Best when MCP is your primary capability fabric and you want the rest of the harness to follow that design.

3. Security-First Harness

Best when approval, isolation, scanning, and secure workflows matter more than raw autonomy.

Related Research in This Repo

Contributing

PRs are welcome. Please prefer:

  • official upstream repos over mirrors
  • registries over long-tail one-off entries when a category is exploding
  • concise descriptions that explain why a repo is harness-related
  • exact links to GitHub repos, not marketing pages

If you are submitting a Skill, keep the skill catalog conventions intact and place the new entry where it best fits within the Skills and Reusable Behavior Packs layer.

See CONTRIBUTING.md for submission guidance.

About

A curated list of awesome Claude Skills, resources, and tools for customizing Claude AI workflows — particularly Claude Code

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors