This is an example repo demonstrating integration between OpenEnv and AgentBeats.
Kickoff: "Assess agent at url=..."
│
v
┌────────────────┐ (1) init / reset() ┌───────────┐
│ │──────────────────────>│ │
│ Assessor A2A │ (return StepResult │ OpenEnv │
│ │ from reset) │ │
└────────────────┘ └───────────┘
│ │ ^
│ │ │ exposes
│ │ v
│ │ (2) Create New MCP server ┌─────────┐ ┌───────────────┐
│ └───────────────────────────>│ │───────>│ │
│ & connect to gateway │ New MCP │ │ MCP-X Gateway │
│ │ │<───────│ │
│ └─────────┘ └───────────────┘
│ (done or ^
│ timeout) │ (5)
│ │ step()
│ │ state()
│ │
│ (3) Send task instructions │
│ (include reset() StepResult) ┌────────────────┐ │
└─────────────────────────────────────>│ │─────────┘
│ Assessee A2A │
(4) "Ok will do" │ │
<────────────────│ │
└────────────────┘
- Start MCP-X (at 9000 by default):
cd mcp-x
uv run python mcp_x.py- Start the assessor agent (at 9999 by default):
cd eb_assessor
uv run python main.py- Start the assessee agent (at 9990 by default):
Here we provide three examples, you can choose to run any one of them or all of them to see how different assessees interact with the assessor and the environment.
eb_assessee_gym: a gym-style agent implementation using MCP tools to interact with the environment
cd eb_assessee_gym
uv run python main.pyeb_assessee_pure_mcp: a slightly modified, llm-driven a2a agent directly from google's repo without using any LLM framework
# remember to config the `.env` file first to include the key
cd eb_assessee_pure_mcp/a2a-mcp-without-framework
uv run --env-file ../.env python -m src.no_llm_framework.server.__main__ --port 9990eb_assessee_human: a human-in-the-loop example where the human can trigger MCP calls manually
# remember to run `npx @modelcontextprotocol/inspector` first to install the inspector for MCP debugging
cd eb_assessee_human
uv run python main.py- Kickoff the evaluation
cd eb_kickoff
uv run python main.py- Env init message (from
reset()) is passed to the assessee in the task instruction message. - Assessee agent is provided with OpenEnv interfaces as MCPs (
state(),step()).
This could be generic for any OpenEnv environment (without Python type enforcement).