A production-style multi-agent orchestration prototype where a planner agent decomposes complex requests, discovers specialised agents through Agent Cards, delegates tasks through an A2A-inspired protocol, and gives agents uniform access to tools through MCP-style servers.
Single-agent assistants are good at simple requests, but they often become brittle when a task needs planning, retrieval, data lookup, web-style search, evidence synthesis, and final summarisation. AgentMesh demonstrates a cleaner architecture:
- one Planner Agent decomposes the user request,
- specialised agents execute sub-tasks,
- agents publish Agent Cards for discovery,
- delegation happens through an A2A-style task interface,
- tools are exposed through MCP-style servers,
- evaluation compares single-agent and multi-agent execution on multi-step tasks.
The goal is not to imitate every line of a protocol specification. The goal is to show a practical engineering pattern that recruiters and engineering teams can understand: separate agent communication from tool execution.
| Capability | What AgentMesh implements |
|---|---|
| Agent discovery | Agent Cards served from a registry and API endpoint |
| A2A-style delegation | Task envelopes, task states, agent messages, artifacts, and trace IDs |
| MCP-style tool access | Uniform list_tools and call_tool interface for SQL, vector search, and REST/mock search |
| Multi-agent orchestration | Planner delegates to retrieval, search, SQL, and summarisation agents |
| Observability | Step-level traces, tool calls, latency, agent routing decisions |
| Evaluation | Single-agent vs multi-agent comparison on 50+ multi-step tasks |
| Cloud-readiness | AWS SAM/Terraform skeleton for API Gateway, Lambda-style agents, DynamoDB state, Bedrock model calls |
flowchart LR
U[User Request] --> API[FastAPI Gateway]
API --> P[Planner Agent]
P --> REG[Agent Registry / Agent Cards]
REG --> R[Retrieval Agent]
REG --> S[Search Agent]
REG --> Q[SQL Agent]
REG --> SUM[Summarisation Agent]
R --> MCP1[MCP Vector Search Server]
S --> MCP2[MCP REST/Search Server]
Q --> MCP3[MCP SQL Server]
R --> SUM
S --> SUM
Q --> SUM
SUM --> API
API --> OUT[Final Answer + Trace]
agentmesh demo "Which enterprise customers had open critical tickets, what internal docs explain the fix, and what should the account manager send them?"Example result:
Planner created 4 steps:
1. Query SQL tickets and accounts
2. Retrieve internal incident/runbook knowledge
3. Search external-style product status snippets
4. Summarise customer-facing action plan
Final answer:
Two enterprise customers have unresolved critical issues. The strongest remediation evidence is in the cache invalidation runbook and the API timeout incident note. The account manager should send a short update acknowledging impact, explaining the mitigation window, and offering a technical follow-up.
The repository includes a deterministic evaluation set with 50 multi-step tasks. The benchmark compares a single generic agent against the routed multi-agent system.
| System | Tool-selection accuracy | Task completion | Avg. steps | Avg. latency |
|---|---|---|---|---|
| Single generic agent | 62.0% | 58.0% | 2.1 | 0.31s |
| AgentMesh multi-agent | 88.0% | 84.0% | 3.7 | 0.46s |
Interpretation: the multi-agent system takes slightly more steps, but it selects tools more accurately and completes compound requests more reliably. The included evaluator is deterministic so the project can be run without paid LLM APIs.
agentmesh/
├── src/agentmesh/
│ ├── api.py # FastAPI gateway
│ ├── cli.py # CLI for demo, run, evaluate
│ ├── orchestrator.py # Planner-led multi-agent orchestration
│ ├── state.py # In-memory/DynamoDB-style task state
│ ├── schemas.py # Shared data contracts
│ ├── agents/
│ │ ├── base.py # Base agent contract
│ │ ├── planner.py # Planner / task decomposer
│ │ ├── retrieval_agent.py # Knowledge retrieval specialist
│ │ ├── search_agent.py # External-search specialist
│ │ ├── sql_agent.py # Structured data specialist
│ │ └── summarizer_agent.py # Final synthesis specialist
│ ├── a2a/
│ │ ├── cards.py # Agent Card definitions
│ │ ├── registry.py # Agent discovery registry
│ │ └── protocol.py # A2A-style task/message envelopes
│ ├── mcp/
│ │ ├── server.py # MCP-style server interface
│ │ └── client.py # MCP-style client interface
│ ├── tools/
│ │ ├── sql_tools.py # SQLite tool server
│ │ ├── vector_tools.py # TF-IDF knowledge retrieval tool server
│ │ └── rest_tools.py # Mock REST/search/status tool server
│ ├── evaluation/
│ │ ├── dataset.py # Generates/evaluates 50+ tasks
│ │ └── evaluator.py # Single vs multi-agent scoring
│ └── cloud/
│ ├── bedrock_client.py # Bedrock-compatible abstraction
│ └── dynamodb_state.py # DynamoDB-state adapter skeleton
├── data/
│ ├── knowledge/ # Demo runbooks and incident notes
│ └── eval/ # Multi-step evaluation dataset
├── docs/
│ ├── ARCHITECTURE.md
│ ├── MCP_A2A_DESIGN.md
│ ├── EVALUATION.md
│ └── AWS_DEPLOYMENT.md
├── infra/
│ ├── aws-sam/template.yaml
│ └── terraform/main.tf
├── tests/
├── reports/benchmark_summary.md
├── Dockerfile
├── docker-compose.yml
├── Makefile
└── README.md
git clone https://github.com/<your-username>/agentmesh.git
cd agentmeshpython -m venv .venv
source .venv/bin/activate # macOS/Linux
# .venv\Scripts\activate # Windows PowerShellpip install -e ".[dev]"agentmesh demo "Which customers have critical open tickets and what should we tell them?"uvicorn agentmesh.api:app --reloadOpen the interactive API docs:
http://localhost:8000/docs
curl -X POST http://localhost:8000/run \
-H "Content-Type: application/json" \
-d '{
"request": "Find critical customer issues, retrieve the right runbook, and draft an account-manager summary.",
"mode": "multi_agent"
}'Response shape:
{
"task_id": "task_...",
"mode": "multi_agent",
"final_answer": "...",
"steps": [...],
"tool_calls": [...],
"metrics": {
"latency_seconds": 0.46,
"agents_used": 4,
"tools_called": 5
}
}Each agent publishes a card with capabilities, input/output modes, and skills.
{
"name": "retrieval-agent",
"description": "Retrieves relevant internal knowledge from vector-search tools.",
"skills": [
{
"id": "retrieve_knowledge",
"name": "Retrieve Knowledge",
"description": "Finds relevant runbooks, incidents, and policy snippets."
}
]
}This makes routing explicit: the planner does not need hard-coded implementation details. It can discover agent capabilities and delegate based on task intent.
AgentMesh exposes tools behind a uniform interface:
tools = await mcp_client.list_tools(server="sql")
result = await mcp_client.call_tool(
server="sql",
tool="query_tickets",
arguments={"severity": "critical", "status": "open"}
)Included tool servers:
| MCP-style server | Tools |
|---|---|
sql |
query_tickets, query_accounts, list_tables |
vector |
retrieve_documents, list_documents |
rest |
search_status, get_service_health, fetch_competitor_signal |
Run the benchmark:
agentmesh evaluate --output reports/eval_results.csvThe evaluator measures:
- tool-selection accuracy
- agent-routing accuracy
- task completion
- average step count
- latency
- unnecessary-tool rate
The project intentionally includes both a single-agent baseline and the multi-agent orchestrator so the improvement is measurable instead of only described.
The local architecture maps cleanly to AWS:
| Local component | AWS equivalent |
|---|---|
| FastAPI gateway | API Gateway + Lambda adapter |
| Agents | Lambda functions |
| Shared state | DynamoDB |
| Tool calls | MCP server Lambdas / internal services |
| LLM reasoning | Amazon Bedrock foundation models |
| Logs/traces | CloudWatch Logs |
The repository includes both AWS SAM and Terraform starter infrastructure. It is intentionally lightweight, so it can be reviewed as architecture without requiring cloud credentials.
- Agent communication and tool execution are separate concerns.
- Agent discovery should be metadata-driven, not hard-coded.
- Tools should be added without rewriting agent logic.
- Planning should be observable, not hidden in one opaque prompt.
- Evaluation should compare architectures, not just model outputs.
multi-agent
mcp
a2a
agentic-ai
agent-orchestration
fastapi
aws-bedrock
lambda
dynamodb
rag
python
llmops
- Replace in-process A2A transport with HTTP-to-HTTP agent calls
- Add signed Agent Cards and capability allowlists
- Add true MCP SDK server implementations
- Add Bedrock Converse API execution path
- Add LangGraph backend adapter
- Add distributed tracing with OpenTelemetry
- Add React trace viewer
- Add human approval step for sensitive tool calls
AgentMesh is a prototype for agent orchestration and evaluation. It should not be connected to sensitive enterprise tools without authentication, authorization, audit logging, tool allowlists, input validation, and human approval for high-impact actions.
MIT License.